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REMARKS 

Claims 1 and 3 are pending in the present application. Claim 3 has been canceled herein. 
Support for the amendment to claim 1 may be found in original claims 1 and 3. Additionally, 
claim 1 has been amended to positively recite, a "computationally filtered set." Support for the 
amendment may be found in the specification at page 26, line 26 through page 27, line 10 and 
page 28, lines 15-28. 

Claim Reiection - 35 USC SI 01 : 

Claims 1 and 3 have been rejected under 35 U.S.C. 101 as having no specific or well- 
established utility. 

The computational method of the present invention is directed at generating a secondary library 
of scaffold protein variants comprising: a) providing a primary library comprising a 
computationally filtered set of scaffold protein primary variant sequences; b) generating a list of 
primary variant positions in said primary library; c) combining a plurality of said primary variant 
positions to generate a secondary library of secondary sequences and d) synthesizing a 
plurality of said secondary sequences. The arguments set forth in the last Office Action 
regarding the computational method for screening variant sequences were merely supportive of 
the methodology's use. Therefore, there has been no change in the scope of the invention as 
claimed. 

The Examiner asserts that because there is "no evidence of record or any line of reasoning that 
would support a conclusion that the secondary library was, as of the filing date, useful for any 
industrial or any pharmacological uses..." (Action, page 5). Applicants respectfully submit 
copies of publications in support of the method and that secondary libraries of the present 
invention find broad application and are useful. Applicants submit US Patent Nos. 6,627,186 
and 6,514,729; Marshall, et. al. Rational Design and Engineering of Therapeutic Proteins, DDT 
;Vol. 8, No. 5 March 2003; Luo, et. al. Development of a Cytokine Analog with Enhanced 
Stability Using Computational Ultrahigh throughput Screening, Protein Science (2002), 1:1218- 
1226; Filikov, et. al. Computational Stabilization of Human Growth Hormone, Protein Science 
(2002), 11:1452-1461; and DeGrado, William F., Proteins from Scratch, Science, 3 October 
1 997, Volume 278, pp. 80-81 . These references provide support that the method of the present 
invention has specific and well-established utility. 
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The method of the present invention computationally generates ("screens") protein sequence 
libraries to select (produce) smaller libraries of protein sequences. This screening produces 
manageable libraries of proteins that may be synthesized and experimentally tested in an assay 
for a desired activity, for improved function and properties. The library can be computationally 
manipulated again, as the method of the present invention may be iterative, to create a new 
library, which then can be synthesized and experimentally tested, and so on. 

The invention has two broad uses; first, the invention can be used to screen libraries based on 
known scaffold proteins. For example, computational screening for stability (or other properties) 
may be done on either the entire protein or some subset of residues. By using computational 
methods to utilize a threshold or cutoff to eliminate disfavored sequences (those not meeting a 
criteria for producing a desired characteristic, e.g., stability), the percentage of useful variants in 
a given variant set size increases, and the required experimental outlay is decreased. 

The method of the present invention is also useful for screening random peptide libraries. 
Signaling pathways in cells often begin with an effector stimulus that leads to a phenotypically 
describable change in cellular physiology. Despite the key role intracellular signaling pathways 
play in disease pathogenesis, in most cases, little is understood about the signaling pathway 
aside from the initial stimulus and the ultimate cellular response. Historically, signal 
transduction has been analyzed by biochemistry or genetics. The biochemical approach 
dissects a pathway in a "stepping-stone" fashion: find a molecule that acts at, or is involved in, 
one end of the pathway, isolate assayable quantities and then try to determine the next 
molecule in the pathway, either upstream or downstream of the isolated one. 

The genetic approach is classically a "shot in the dark": induce or derive mutants in a signaling 
pathway and map the locus by genetic crosses or complement the mutation with a cDNA library. 

Limitations of biochemical approaches include a reliance on a significant amount of pre-existing 
knowledge about the constituents under study and the need to carry such studies out in vitro, 
post-mortem. The limitations of purely genetic approaches include the need to first derive and 
then characterize the pathway before proceeding with identifying and cloning the gene. 

The literature is replete with examples of small peptides capable of modulating a wide variety of 
signaling pathways, which have been screened in in vitro assays for bioactivity. 

Accordingly, generation of random or semi-random sequence libraries of proteins and peptides 
allows for the selection of proteins (including peptides, oligopeptides and polypeptides) with 
useful properties. The sequences in these experimental libraries can be randomized at specific 
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sites only, or throughout the sequence. The number of sequences that can be searched in 
these libraries grows exponentially with the number of positions that are modified. Generally, 
only a small number sequences can be contained in an experimental library because of the 
physical constraints of laboratories (the size of the instruments, the cost of producing large 
numbers of biopolymers, etc.). These limits may be reached by selecting just a few amino acid 
positions to modify. Therefore, only a small sampling of sequences is possible to search for 
improved proteins or peptides in experimental sequence libraries. This lowers the chance of 
success and results in missing desirable variant candidates. Due to the randomness of 
changes in these techniques, most of the candidates in the library are not suitable, resulting in 
waste of much of the effort and resources used to produce the experimental library. 

The present invention generates virtual libraries of protein sequences that are vastly larger than 
traditional experimental libraries. Many more candidate sequences may be screened 
computationally and those that meet design criteria, which favor stable and functional proteins, 
may be selected. An experimental library consisting of the favorable candidates found in the 
virtual library screening may then be generated, resulting in more efficient use of the 
experimental library and overcoming the limitations of random protein libraries. 

By limiting the number of randomized positions and the number of possibilities at these 
positions, the number of wasted sequences produced in the experimental library is reduced, 
thereby increasing the likelihood of finding sequences with the desired properties. 

Additionally, by computationally screening large libraries, greater diversity of protein sequences 
may be screened (i.e. a larger sampling of sequence space), leading to greater improvements in 
protein function. Furthermore, fewer variant sequences need to be experimentally tested 
(physically generated) to screen a given library, reducing the cost and difficulty of protein 
engineering. By using computational methods to screen protein libraries, speed and efficiency 
are combined with the ability of experimental library screening to create new activities in 
proteins for which appropriate computational models and structure-function relationships are 
unclear. 

The method of the present invention provides for the biasing of libraries in any number of ways 
(filtered set), allowing the generation of secondary libraries having desired characteristics, e.g., 
improved function or stability. For example, domains, subsets of residues, active or binding 
sites, surface residues, etc. may be selected, thereby increasing the diversity of sequences 
generated by the method of the present invention. 
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In addition, Applicants submit that there is adequate support at the Specification at page 5, line 
13 through page 8, line 24, for both specific utility and well-established utility. 

In light of the above-referenced publications, the support in the specification and Applicants 
discussion above. Applicants submit that the method and resulting secondary libraries have 
both a specific and well-established utility. 

Claim Reiection - 35 USC ^112. First Paragraph: 

Claims 1 and 3 are rejected under 35 USC §112, first paragraph because the specification while 
enabling for the enzymes protein design using specific program design, does not reasonably 
provide enablement for any type of secondary library of scaffold protein variants or sequences. 
Claim 3 has been canceled herein making rejection moot. 

Applicability of the method is not speculative or unexplored, as the Action suggests. The Action 
states that the accuracy of the statements in the specification and claims must be sufficiently 
supported by well-established chemical principles or by sufficient number of examples. The 
present application is founded on well-established principles of chemistry. The present 
invention utilizes well-established protein design methods, which are founded on basic 
principles of chemistry, e.g., utilizes structural and biophysical knowledge of proteins. See for 
example, Specification at page 1, lines 11-25. 

The Action cites the "high unpredictability of the newly emerging biolibrary art..." (page 9) page 
1, lines 13-25 and page 6, Iine3 to page 7, line 3. Applicants respectfully submit that the cited 
references of "biolibraries" are distinguishable from the present invention because the cited 
sections discuss non-rational methods in current use for screening libraries of mutants, and 
which are highly unpredictable. 

The present invention is clearly distinguishable from the "directed molecular evolution" 
mutagenesis techniques referenced in the Action because as stated above, it is a rational 
design technique, not a randomly generated library, as is done with directed molecular 
evolution. 

The present method may further be distinguished from the referenced biochemical and genetic 
methods, again because it generates secondary library members by in a rational way. The 
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present invention allows the screening of a substantially greater number of variants because it 
utilizes structural and biophysical knowledge of the target proteins (well-established principles of 
chemistry.) See for example, Specification at page 1, lines 11-25. 

As stated above, the present invention provides significantly diverse results because is a 
rational method. The directed molecular evolution techniques have limited diversity because 
their libraries are generated using multiple sequences whose fragments are mixed together and 
re-assembled based on pre-identified cross-over points (sections of sequence homology among 
the initial sequences). Thus, diversity is limited mixtures of the initial sequences chosen. 

The enablement requirement refers to the requirement of 35 USC 112, first paragraph that the 
specification describe how to make and how to use the invention. One skilled in the art must be 
enabled to make and use is that defined by the claim(s) of the particular application or patent. 
Applicants submit that in light of the distinction between rational and non-rational protein design 
methods and the above-arguments, the method of the present invention is enabled by the 
disclosure. 

In conclusion, Applicants submit that the Specification taken in conjunction with the state of the 
art at the time the invention was filed and the evidence in support of the broad applicability of 
the method fully enables a person skilled in the art to practice the invention without undue 
experimentation. Applicants respectfully request reconsideration and withdrawal of the rejection 
of the claim. 

Claim Reiection - 35 USC g1 12. Second Paragraph: 

Claims 1 and 3 are rejected under 35 USC 112, second paragraph, as being indefinite for failing 
to particularly point out and distinctly claim the subject matter which applicant regards as the 
invention for reasons of record. Claim 3 has been canceled herein, so only claim 1 will be 
addressed below. 

Applicants submit that the filtered set is not necessarily obtained by a rank ordered list or by a 
scoring function. In some cases the user of the method will use selection criteria that may or 
may not be a rank ordered list of sequences, as that is only one embodiment. Other criteria 
may be used to bias the list. In response, Applicants disagree with the limitation of a rank 
ordered list. See Specification beginning on page 26, line 26 through page 27, linelO. 
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In light of the forgoing argument Applicants respectfully request reconsideration and withdrawal 
of the rejection of the claim. 

Claim Reiection - 35 USC ^102: 

Claims 1 and 3 are rejected under 35USC 1 02 as being anticipated by Fechteler et al. 

The Fechteler reference is a homology modeling paper about predicting protein structure in 
regions of insertions and deletions. This reference is directed to designing a protein model by 
homology and predicting what structure the sequence would adopt. The reference does not 
retain the sequence information because they are focused on the backbone structure. 

"A claim is anticipated only if each and every element as set forth in the claim is found, either 
expressly or inherently described, in a single prior art reference." Verdegaal Bros. v. Union Oil 
Co. of California, 814 F.2d 628, 631, 2 USPQ2d 1051, 1053 (Fed. Cir. 1987). "The identical 
invention must be shown in as complete detail as is contained in the ... claim." Richardson v. 
Suzuki Motor Co., 868 F.2d 1226, 1236, 9 USPQ2d 1913, 1920 (Fed. Cir. 1989)." 

The present invention may be distinguished from the cited reference because there is no 
suggestion or teaching of synthesizing variants of a secondary library. The present invention is 
not directed to predicting theoretical 3D structures based solely on homology models and does 
not rely on insertion and deletion regions. Therefore, the claim as amended, is not anticipated 
by the cited reference because each and every element as set forth in the claim is not found, 
either expressly or inherently described, in the reference. In light of the foregoing. Applicants 
respectfully request reconsideration and withdrawal of the claim rejections. 

DOUBLE PATENTING: 

Claims 1 and 3 are rejected under the judicially created doctrine of obviousness-type double 
patenting as being unpatentable over claims 1-2 of US 6, 403, 312. 

Applicants respectfully submit that a terminal disclaimer is being filed herewith for the 
Examiner's review, thereby making the double patenting rejection moot. 

The Applicants submit that in light of the above-amendment and argument submission of a 
terminal disclaimer, the claims are now in condition for allowance and an early notification of 
such is respectfully solicited. 
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Abstract 

Recombinant human growth hormone (hGH) is used worldwide for the treatment of pediatric hypopituitary 
dwarfism and in children suffering from low levels of hGH. It has limited stability in solution, and because 
of poor oral abs~orption, is administered by injection /typically several times a week. Development has 
therefore focused on more stable or sustained-release formulations and alternatives to injectable delivery 
that would increase bioavailability and make it easier for patients to use. We redesigned hGH computa- 
tionally to improve its thermostability. A more stable variant of hGH could have improved pharmacokinetics 
or enhanced shelf-life, or be more amenable to use in alternate delivery systems and formulations. The 
computational design was performed using a previously developed combinatorial optimization algorithm 
based on the dead-end elimination theorem. The algorithm uses an empirical free energy function for scoring 
designed sequences. This function was augmented with a term that accounts for the loss of backbone and 
side-chain conformational entropy. The weighting factors for this term, the electrostatic interaction terra, 
and the polar hydrogen burial terra were optimized by minimizing the number of mutations designed by the 
algorithm relative to wild- type. Forty-five residues in the core of the protein were selected for optimization 
with the modified potential function. The proteins designed using the developed scoring function contained 
six to 10 mutations, showed enhancement in the melting temperature of up to 16°C, and were biologically 
active in cell proliferation studies. These results show the utility of our free energy function in automated 
protein design. 

Keywords: Protein design; free energy; entropy; human growth hormone; thermostability 



Human growth hormone (hGH) is a polypeptide hormone 
that is synthesized by the somatotropic cells of the anterior 
pituitary. It plays an important role in somatic growth 
through its effects on the metabolism of proteins, carbohy- 
drates, and lipids. hGH is currently used for the treatment of 
pediatric hypopituitary dwarfism and in children suffering 
from low levels of hGH (Hindmarsh and Brook 1987). It is 
believed that hGH functions by direct action on bone and 
soft tissue to cause uniform growth and by indirect stimu- 
lation of insulin-like growth factor- 1 (Pearlraan and Bewley 
1993). 

The raost prevalent form of pituitary hGH is a single- 
chain polypeptide containing 191 amino acids, internally 
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cross-linked by two disulfide bonds. The molecular mass is 
-22 kD, with pi near 5.3. Approximately 55% of the poly- 
peptide backbone exists in a right-handed a-helical confor- 
mation. The homione is a four-helix bundle showing an 
up-up-down-down topology. Activation of transmembrane 
receptors for hGH (hGHbp) occurs when diraerization of 
receptor chains is triggered by binding of hGH to a ligand- 
binding domain on the receptor. The crystal structure of the 
wild-type hGH in 1 :2 complex with its receptor was deter- 
mined to 2.8-A resolution (de Vos et al. 1992). There are 
other crystal structures of the protein, its mutants, and com- 
plexes available in the literature and the Protein Data Bank 
(PDB; Sundstrom et al. 1996; Atweli et al. 1997; Clackson 
et al, 1998). 

Met-hGH and hGH are produced recombinantly and are 
available worldwide for clinical use. Both forms have thera- 
peutic activity that is equivalent to the pituitary-derived ma- 
terial (Jorgensen 1987). Because hGH is a protein, it is not 
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absorbed orally to any significant extent (Moore et al. 1986) 
and must be administered by injection. It is typically given 
subcutaneously or intramuscularly several times a week 
over an extensive period. It has limited stability in solution 
(for -2 weeks at 2°C to 8*^C) and is commonly stored in 
freeze-dried form. Development has therefore focused on 
more stable or sustained-release formulations and alterna- 
tives to injectable delivery that would increase bioavailabil- 
ity and make it easier for patients to use. In this study, we 
have redesigned hGH computationally to improve its ther- 
mostability, A more thermostable variant of hGH could 
have improved utilization time or a longer shelf-life, which 
would translate into decreased costs for the manufacturer 
and added convenience and compliance for patients. Ther- 
mostability, described in terms of T^, or tlie denaturation 
temperature of unfolding, has been used to predict the best 
long-term storage conditions for protein pharmaceuticals 
(Schrier er al. 1993;'Renimele et al. 1*998). A more stable 
variant of hGH could also have improved pharmacokinetics 
or be more amenable to use in alternative delivery systems 
and' formulations. 

There are two components required for computational 
design: (1) accurate scoring functions to rank sequences and 
(2) high-speed optimization methods to rapidly find the best 
sequences from the enormous combinatorial search space 
(Dahiyat 1999). We use our Protein Design Automation 
(PDA) method (Dahiyat and Mayo 1996, 1997a; Dahiyat et 
al. 1997), which incorporates the dead-end elimination 
(DEE) algorithm Pesmet et. al 1992; Goldstein 1994). Us- 
ing a rotamer description of the side-chains, an optimal 
sequence for a backbone can be found by screening all 
possible sequences of rotamers, in which each backbone 
position can be occupied by each amino acid in all possible 
rotameric states. 

The scoring functions used for protein design were 
recently reviewed by Gordon et al. (1999). Although 
nonenergy terms such as secondary structure propen- 
sities can be used, the most successful designs use 
energy functions based on molecular mechanics force 
field terms (van der Waals, hydrogen bonding, electro- 
statics, bond and angle energy), that is, potential energy 
terms, or their combinations with free energy terms 
such as solvation (Dahiyat and Mayo 1996) or entropy 
(Hellinga and Richards 1994; Dahiyat and Mayo 1996; 
Kono et al. 1998). Here we use a previously developed 
scoring function that includes potential energy terms (van 
der Waals, hydrogen bonding, electrostatics) and solvation 
terms (polar hydrogen burial and nonpolar exposure penal- 
ties, nonpolar burial energy) augmented with a term that 
accounts for the loss of backbone and side-chain conforma- 
tional entropy. Before side-chain selection, residues are 
identified as core, surface, or boundary using the RES- 
CLASS residue classification program (Dahiyat and Mayo 
1997a). 



Combining potential energy and free energy terms to es- 
timate the free energy of folding or binding assumes both 
additivity and proportionality of potential energy and free 
energy terms. This necessarily raises the question of proper 
weighting factors for the terms. In the simplest treatment, 
the weighting factors are assumed to be equal to one (Kono 
et al. 1998). Alternatively, the factors can be derived by 
regression to experimental free energy data (Dahiyat and 
Mayo 1996; Filikov and James 1998; Filikov et al. 2000). 
Here we use a different approach: The weighting factors are 
optimized by minimizing the number of mutations designed 
by the algorithm. 

The loss of entropy on formation of the folded protein is 
believed to be the principal force opposing folding (Stites 
and Pranata 1995). Therefore, inclusion of side-chain and 
main-chain entropy terms into the scoring function with 
proper weighting factors could improve scoring of designed 
sequences. A side-chain entropy term has been incoiporated 
into protein design energy functions previously (Hellinga 
and Richards 1994; Kono et al. 1998). The change in side- 
chain entropy on folding can be modeled as the change in 
the number of rotatable bonds, assuming that conforma- 
tional freedom is completely restricted in the folded state 
(Hellinga and Richards 1994). An empirical approach is 
based on the entropy of fusion of small organic compounds 
(Sternberg and Chickos 1994). Alternatively, the change in 
entropy can be derived from the distribution of side-chain 
rotamers in crystal structures (Pickett and Sternberg 1993) 
or in Monte Carlo simulations (Creamer and Rose 1992; 
Creamer 2000). These and other methods of estimating con- 
formational entropy have been described recently (Creamer 
2001) and were shown to correlate extremely well, despite 
different methods of derivation. In this study, we use both 
side-chain and backbone entropy terms based on scales in- 
troduced by Pickett and Sternberg (1993) and by Stites and 
Pranata (1995), respectively (Table 1). 

Results 

The scoring function used in this work is a sum of the 
following terms: van der Waals interaction, hydrogen bond 
potential, distance-dependent Coulombic electrostatics, po- 
lar hydrogen burial penalty, nonpolar burial energy, nonpo- 
lar exposure penalty, and entropy. A detailed description of 
all the terms, except the entropy term, is given elsewhere 
(Dahiyat and Mayo 1997a,b). Here we optimize the weight- 
ing factors for the entropy (Xg) and polar hydrogen burial 
penalty (AG^) terms, and the dielectric constant (e) for the 
electrostatic term. In the following sections, we describe 
independent optimization of each parameter, beginning with 
the entropy term. Although simultaneous optimization of 
the three weighting factors is a possibility, such an approach 
is often problematic because of correlations between param- 
eters. Here, by focusing on different sets of residue classes 
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Table 1. Values ofTAS (kcaUmol) for the amino acids at 20°C 



Amino acid 


Side-chain TAS" 


Backbone TAS** 


Total TAS*^ 


Ala 


0.0 


-0.71 


-1,21 


Arg 


-2.03 


-0.51 


-3.44 


Asn 


-1.57 


-0,18 


-3.31 


Asp 


-1.25 


-0.29 


-2,88 


Cys 


-0.55 


-0.29 


-2.18 


Gin 


-2.1 1 


-0.48 


-3.55 


Glu 


-1.81 


-0.64 


-3.09 


Gly 


0.0 


0.0 


-1.92 


His 


-0,96 


-0.21 


-2.67 


He 


-0.89 


-0.59 


-2.22 


Leu 


-0.78 


-0.55 


-2.15 


Lys 


-1.94 


-0.42 


-3.44 


Met 


-L61 


-0.51 


-3.02 


Phe 


-0.58 


-0.31 


-2.19 


Pro 


0.0 


-0.82 


-1.10 


Set 


-1.71 


-0.28 


-3,35 


Thr 


-1,63 


-0.29 


-3.26 


Trp- • 


• -0:97 


- ■ -0.44 • 


' -2,45 • 


Tyr 


-0.98 


-0.32 


-2.58 


Val 


-0.51 


-0.57 


-1.86 



^ Taken from Pickett and Sternberg (1993). 
^ Taken from Sites and Pranata (1995). 

" Obtained by summing up the side-chain scale and the backbone scale, 
corrected for the glycine backbone entropy loss (-1.92 kcal/mole) taken 
from D' Aquino et al. (1996); TAS = TAS,i,,.,h.in + (-1-92 - TAS^..^^^,). 



that are predominantly dependent on one of the parameters, 
we are able to optimize each parameter independently, thus 
minimizing the possibility of spurious results. Furthermore, 
simultaneous optimization is significantly more computa- 
tionally intensive because it requires that a much more ex- 
tensive set of calculations be performed and analyzed. 

Weighting factor for the entropy term 

We optimized the weighting factor for the entropy term by 
minimizing the number of mutations designed by the algo- 
rithm. This approach is based on the assumption that the 
wild-type sequence is reasonably close to the global energy 
minimum (GEM) in the sequence space of a particular fold 
(Kuhlman and Baker 2000), and by minimizing the distance 
from the wild- type sequence, we minimize the distance 
from the GEM sequence. The wild-type sequence often is 
not the GEM sequence, because stabilizing mutations for 
numerous proteins are known. For example, in this work we 
find two sequences that are considerably more thermostable 
than the wild type. Without knowing the GEM sequence, 
however, a reasonable option is to use the wild-type se- 
quence as a target for optimization of the algorithm param- 
eters. To derive and validate a broadly applicable parameter 
set, a number of different proteins should be used to opti- 
mize parameter values. Here we derive a parameter set 
based only on hGH, which will be tested more extensively 
in future work. 



We selected 45 residues buried in the core of hGH 
for entropy calculations. ' Residue classification with 
RESCLASS gives 71 core residues for hGH (PDB structure 
3HHR). To make the calculations faster and to focus the 
optimization on residues for which the entropy term is iso- 
lated from the electrostatic and polar hydrogen energies, we 
reduced this list to 45 positions by eliminating residues 
involved in hydrogen bonds and residues with significant 
exposure to the solvent. Several rounds of design were per- 
fomed with PDA using different weighting factors for the 
entropy term in the range of one to four. For each round of 
design, the DEE algorithm was run to completion; that is, 
the global energy minimum sequence (GEMS) was identi- 
fied. The number of mutations contained in the GEMS 
strongly depends on the entropy term weighting factor and 
has a clear minimum centered at 2.2 (Table 2; Fig. 1). At 
smaller entropy weighting factors (<1.7), the GEMS tends 
to contain a lot of methionine residues (methionine is- very 
flexible and can fill cavities of a wide variety of shapes). 
Simultaneously, the loss of total entropy on folding (7A5) 
for methionine is very high (-3.02 kcal/mole). Frequent 
appearance of methionines in the GEMS is the most obvious 
consequence of neglecting the entropy terni in the scoring 
function. At higher entropy weighting factors (>3), the 
GEMS tends to have larger residues mutated to smaller 
ones, that is, to less entropically rich residues: Ile->Val, 
Leu->Ala, and Met->AIa (see Table 2), The optimal value 
for the entropy weighting factor is in the range of 1 .7 to 2.7, 
as can be seen from Figure 1. 

Weighting factor for the electrostatic term 

The same approach was used for optimization of the weight- 
ing factor for the electrostatic term. However, the set of core 
residues used for the entropy calculations cannot be used to 
optimize the electrostatics term, because there are few polar 
residues in the core. For these calculations, we selected a set 
of 28 boundary residues. These were obtained by running 
the RESCLASS algorithm, which gives 41 boundary resi- 
dues for hGH, and eliminating the residues within 5 A of the 
receptor and Gly 104, because it has unusual <{) and \\f angles. 
The result is a set of 28 residues that are predominantly 
buried: The solvent accessible fraction of the residue sur- 
face is 32.7% on average. Therefore, we assume that for 
these residues all the entropy of the unfolded state is lost on 
folding and treat them no differently from core residues in 
this respect. 

Ten rounds of design were performed using different val- 
ues of the dielectric constant for the electrostatic term in the 
range of 5R to 40R, where R is the interatomic distance 
(varying e is equivalent to varying the weighting factor). 
The weighting factor for the entropy term was set to 2.3, the 
midpoint of the optimal values found previously (Fig. 1 ). 
For each round of design, the DEE algorithm was run to 
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Table 2. Global energy minimum sequences found by PDA for 
different entropy weighting factors (\J 

K 





Wild 


Position" 


type 


6 


Leu 


10 


Phe 


13 


Ala 


17 


Ala 


20 


Leu 


24 


Ala 


27 


Thr 


28 


Tyr 


3] 


Phe 


36 


He 


44 


Phe 


54 


Phe 


55 


Ser 


58 


He 


■73 


*'Leu 


75 


Leu 


76 


Leu 


78 


He 


79 


Ser 


80 


Leu 


8] 


Leu 


82 


Leu 


83 


He 


85 


Ser 


90 


Val 


93 


Leu 


96 


Val 


97 


Phe 


105 


Ala 


110 


Val 


114 


Leu 


117 


leu 


11 


He 


124 


Leu 


157 


Uu 


161 


Gly 


162 


Leu 


163 


Leu 


166 


Phe 


170 


Met 


173 


Val 


176 


Phe 


177 


Leu 


180 


Val 


184 


Ser 



1 


1.4 


1.7 


2 


2.3 


2.7 


3 


4 


b 

Val 




Val 




.Val 




Val 


Val 


Val 


Val 


Val 


Met 


Mel 












He 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 














Val 


Val 



■■ — ■ Ala - Ala 



Ala Ala Ala Ala Ala Ala Ala Ala 



________ Val 

Ala Ala Ala Ala Ala Ala Ala Ala 
He He — — — _ _ _ 



Met Met — — — — — — 

Met Met Met Met Met Met — Phe 

Met Met — — — — — 

Val Val — — _ _ _ _ 



Met Met Met Met Met Met Met Met 



Leu' Leu Leu Leu Leu Leu Met Leu 
— — . — — — — Leu Ala 



Ala Ala Ala Ala Ala Ala Ala Ala 

" The residues are numbered as in 3HHR file from Brookhaven Protein 
Databank. 

^ " — " indicates the wild-type residue. 

completion. The number of mutations contained in the 
GEMS is plotted versus the dielectric constant in Figure 2. 
As can be seen, the curve has a distinct minimum at 
sfR = 10.3 + 0.9. 

At low dielectric constants, PDA tends to place charged 
or polar residues; at higher constants, these mutate to wild- 
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Number of mutations versus weighting factor for the entropy term. 



type or non-wild-type uncharged or apolar residues (see 
Table 3). Examples of this trend include positions 34, 35, 
71, 84, and 157. On the other hand, at high constants, some 
charged positions, including the wild-type ones, mutate to 
apolar amino acids. Examples of this trend include positions 
74 and 118. Superposition . of these two trends results in a 
curve with a minimum at e/R = 10.3 ± 0.9. 

Polar hydrogen burial penalty term 

To optimize the polar hydrogen burial penalty term, we ran 
1 1 rounds of design with values of the penalty from 0 to 3 
kcal/mole. The dielectric constant was set to 10.3R, the 
optimal value obtained previously (Fig. 2). The entropy 
term weighting factor and other parameters were the same 
as in the optimization of the dielectric constant, as were the 
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Fig. 2. Number of mutations versus dielectric constant. 
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Table 3. Global energy minimum sequences found by PDA for different dielectric 
constants (€) 

e/R, R = interatomic distance 



Position" 


Wild 
type 


40 


20 


15 


12.5 


11.25 


10 


9.37 


8.75 


7.5 


5 


6 


Leu 


b 




















14 


Met 


Phe 


Phe 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


26 


Asp 






















30 


Glu 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


32 


GIu 






















34 


Ala 
















Ser 


Ser 


Hsp 


35 


Tyr 




















Asp 


40 


Gin 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


50 


Thr 


Met 


Met 


Met 


Met 


Met 


Met 


Met 


Met 


Met 


Met 


56 


Glu 






















57 


Ser 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


59 


Pro 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


66 


Glu 






















71 


Ser 


Thr 


Thr 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


74 


Glu 


Phe* 


•Phe* 


- 
















84 


Gin 


lie 


He 


He 


He 


He 


He 


He 


He 


Lys 


Lys 


92 


Phe 






















107 


Asp 


Ala 


Ala 


Ala 


Ala 














109 


Asn 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


113 


. Leu 






















118 


Glu 


Phe 


Phe 


















125 


Met 






















130 


Asp 


His 


His 


His 


His 


His 


His 


His 


His 


His 


His 


139 


Phe 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


• Ala 


Ala 


Ala 


Ala 


143 


Tyr 


Ala 


Ala 


Ala 


Ala 


Ala 


■Ala 


Ala 


Ala 


Ala 


Ala 


157 


Leu 




















Hsp 


158 


Lys 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


183 


Arg 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 



The residues are numbered as in 3HHR file from Brookhaven f^rotein Databank. 
^ " — " indicates the wild-type residue. 



residues selected for design (28 boundary residues). The electric constant of 10.3R. The surface-based nonpolar ex- 
number of mutations contained in the GEMS is plotted ver- posure penalty and nonpolar burial benefit were set to 0.048 
sus the polar hydrogen burial penalty in Figure 3. The op- kcal/mole/A^. The calculation resulted in 1 1 mutations 
timai value for the penalty is 1 .6 ± 0,6 kcal/mole. 

At low values of the penalty, charged or polar residues 
appear at some positions and become apolar or less polar as 
the penalty increases (see Table 4). This is the case for 
positions 40, 57, 71, 84, 139, 143, and 157, At high values 
of the penalty (AG^ ^ 2.5), v/ild-type Glu at position 74 
mutates to Phe; that is, a charged residue mutates to an 
apolar one. Superposition of these two trends results in a 
curve with a minimum at 1 .6 ± 0.6 kcal/mole. 



Redesign of the core of hGH 

To enhance the thermostability of hGH, we used PDA to 
computationally redesign 45 residues in the core of the pro- 
tein (the same set that was used in the entropy weighting 
factor optimization). We used the parameters optimized as 
described above: entropy term with weighting factor of 2.3, 
penalty for polar hydrogen burial of 1,6 kcal/mole, and di- 
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Table 4. Global energy minimum sequences found by PDA for different polar hydrogen burial 
penalties (^Gf^) 

AGh (kcal/mole) 



Position" 


Wild 
type 


0 


0.5 


0.75 


1 


1.25 


1.5 


1.75 


2 


2.25 


2.5 


3 


6 


Leu 


b 






















14 


Met • 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


26 


Asp 
























30 


Glu 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


Trp 


32 


Glu 
























34 


Ala 
























35 


Tyr 
























40 


Gin 


Arg 


Arg 


Arg 


Arg 


Arg 


Arg 


Arg 


Arg 


Arg 


Arg 


Arg 


50 


Thr 


Phe 


Phe 


Phe 


Met 


Met 


Met 


Met 


Met 


Met 


Met 


Mel 


56 


Glu 
























57 


Ser 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Tyr 


Ala 


Ala 


Ala 


59 


Pro 


Va] 


Vat 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


Val 


66 


Glu 
























71 


Ser 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Thr 


Thr 


-74 ■ 


Glu 


— " 




— ' 




— 




— 




— 


Phe 


Phe 


84 


Gin 


Arg 


Lys 


Lys 


Lys 


Lys 


Lys 


He 


He 


Ue 


He 


He 


92 


Phe 
























107 


Asp 
























109 


Asn 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Tyr 


Tyr 


113 


Leu 
























118 


Glu 


Leu 






















125 


Met 






















Val 


130 


Asp 


His 


His 


His 


His 


His 


His 


His 


His 


His 


His 


His 


139 


Phe 


His 


His 


His 


His 


His 


His 


His 


Ala 


Ala 


Ala 


Ala 


143 


Tyr 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


157 


Leu 


Arg 


Arg 


Arg 


















158 


Lys 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


Phe 


183 


Arg 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 


Hsp 



" The residues are numbered as in 3HHR file from Brookhaven Protein Databank. 
" — " indicates the wild-type residue. 



(Table 5). We selected sequences for experimental testing 
by ranking the mutations according to their contribution to 
lowering the energy of the wild-type sequence. The six 
highest-ranking mutations were selected for the COREl se- 
quence; two others were added to obtain C0RE2; and an- 
other two were added to obtain C0RE3 (Table 5). The 
model structure and 10 mutations of the C0RE3 protein are 
shown in Figure 4. 

Thermal stability 

COREl, C0RE2, and C0RE3 proteins and wild-type hGH 
were expressed and isolated as described in Materials and 
Methods. The far-ultraviolet circular dichroism spectra for 
the proteins were nearly identical to each other and to the 
wild-type protein, indicating highly similar secondary struc- 
ture and tertiary folds (data not shown). Thermal denatur- 
ation was monitored at 222 nm for wild-type hGH, COREl, 
and C0RE2 (Fig. 5; data was not obtained for CORES). 
The melting temperatures (T^s) were estimated graphi- 



cally by finding the midpoints on the transition region of the 
melting curve. Because the T^^s for the mutants are close to 
100°C and the ends of the transition regions of the curves 
are beyond the experimental range, only the lower bounds 
of the T^s can be estimated. This gives the following val- 
ues: wild-type T^ = 82°C, COREl T^ > 98°C, and 
C0RE2 T^ > 95''C. The designed proteins thus showed 
enhancements of 13°C to 16°C. It should be noted that 
thermal melting was not reversible as measured; therefore, 
the T^ values given here are not rigorous thermodynamic 
parameters. However, these values are indicative of the im- 
proved thermostability of the designed proteins. 

Biological activity 

The biological activity of COREl, C0RE2, and C0RE3 
proteins was determined in vitro by quantitating cell prolif- 
eration as a function of protein concentration. Figure 6 
shows the dose-response curves of COREl, C0RE2, 
C0RE3, and wild-type hGH in a representative assay. EC50 
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Table 5. Global energy minimum sequence found by PDA using 
optimized parameters for a core design of hCff* (Design) and 
experimentally tested sequences (CORE!, C0RE2, C0RE3) 



Position'' 


Wild type 


Design 


COREl 


C0RE2 


C0RE3 


6 


Leu 




— 


— 


— 


10 


Phe 


— 


— 




— 


13 


Aia 


Val 


Val 


Val 


Val 


17 


Ala 


— 


— 


— 


— 


20 


Leu 


— 


— 


— 


— 


24 


Ala 


— 


— 


— 


— 


27 


Thr 


VaJ 


Val 


Val 


Val 


28 


Tyr 


Phe 


— 


— 


Phe 


31 


Phe 


— 


_ 


— 


— 


36 


lie 


— ■ 




— 


— 


44 


Phe 


— 


— 


— 


— 


54 


Phe 


Tyr 


— 




Tyr 


55 


Ser 


Ala 


— 


Ala 


Ala 


58 


He 


— 


— 


— 


— 


73 


Leu 






— 


— . 


75 


Leu 


— 


— 


— 


— 


76 


Leu 


— ■ 


— 


— 


— 


78 


Ue 


— 


— 


— 


— 


79 


Ser 


Ala 


Ala 


Ala 


Ala 


80 


Leu 


— 


— 




— 


81 


Leu 


— 






— 


82 


Leu 


— 


— 


— 


— 


83 


He 


— 




— 


— 


85 


Ser 


Ala 


— 


Ala 


Ala 


90 


Val 


rie 


ne 


He 


He 


93 


Leu 


— 


— 


— 


— 


96 


Val 


— 


— 


— 


— 


97 


Phe 


— 




— 


— 


105 


Ala 


— 




— 


— 


no 


Val 


— 








114 


Leu 


Met 




— 


— 


117 


Leu 


— 


— 


— 




121 


lie 


— 


— 


— 


— 


124 


Leu 










157 


Leu 










161 


Gly 


Met 


Met 


Met 


Met 


162 


Leu 










163 


Leu 










166 


Phe 










170 


Met ' 










173 


Val 










176 


Phe 










177 


Leu 










180 


Val 










184 


Ser 


Ala 


Ala 


Ala 


Ala 


" Protein Data Bank structure 3HHR. • 



The residues are numbered as in 3HHR file from Brookhaven Protein 
Databank. 
" — " indicates the wild-type residue. 



values were determined by nonlinear least-squares fit of 
sigmoidal parts of the averaged curves to a four-parameter 
sigmoidal equation as described in Materials and Methods. 
The designed proteins showed comparable activity to wild- 
type hGH (Table 6). 
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Fig. 4. Structure of CORE3 protein (a model generated by PDA, our 
computational design method). The 10 mutations are shown in ball-and- 
stick representation (hydrogen atoms are not shown). 

Discussion 

This study has two purposes: (1) improving our sequence 
energy scoring function by both adding an entropy term and 
optimizing the relative weights of the energy terms, and (2) 
improving the thermostability of hGH. We designed only 




20 30 40 50 60 70 . 80 90 100 



Temperature {°C) 

Fig. 5. Thermal denaturation monitored by circular dichroism at 222 nm 
for wild-type hGH (solid line), COREl (dashed line), and C0RE2 (dotted 
line). 
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0 100 200 300 400 500 600 700 800 



Protein concentration (pg/nni) 

Fig. 6. Proliferation of BAF/B03 cells expressing the hGH receptor in 
response to wild-type hGH (diamonds), CORE! (squares), C0RE2 (tri- 
angles), and C0RE3 (circles). Each point represents the average of three 
replicates. 



the core residues of the protein, rather than the surface- 
exposed residues, to reduce the probability of an immuno- 
genic response to the mutated protein. Designing only core 
residues simplified implementation of the entropy penalty. 
Core residues can be modeled simply as losing all entropy 
relative to the free side-chain, whereas boundary and sur- 
face residues require correction factors, such as scaling 
based on accessible surface area (Abagyan and Totrov 
1994), to account for remaining conformational flexibility 
in the folded state. Designing only core residues also sim- 
plified electrostatic modeling. Optimization of the weight- 
ing factor for electrostatic energy showed that a large dis- 
tance-dependent dielectric (e/R -10) was necessary to re- 
duce the magnitude of the electrostatic energy and mitigate 
inaccuracies in the charge model, a weakness of all force 
fields. Because no charged residues were in the core design, 
the inaccuracies of force field approaches for charge-charge 
interactions were eliminated from our hGH variants. 

In this work, we optimize the weighting factors by mini- 
mizing the number of mutations designed by the algorithm. 
That is, we assume that the wild-type sequence approxi- 
mately corresponds to the global energy minimum in the 
sequence space. This is more correct for highly stable pro- 



Table 6. Biological activity of the designed proteins 



Protein 


EC50 (pg/mL) 


Wildtype hGH 


220 ± 20 


CORE! 


320 ± 30 


C0RE2 


260 ± 50 


C0RE3 


230 ± 50 



teins, which were optimized for stability by nature. There- 
fore, further development of this idea should include calcu- 
lations on a test set of several highly stable proteins with 
known high-resolution X-ray structures. An alternative ap- 
proach to optimize the weighting factors is the use of mu- 
tagenesis data to correlate mutant stability with the energy 
function predictions. Of particular interest, of course, is test- 
ing the stability of the sequences designed by the algoritlim. 
Unfortunately, this is a very time-consuming approach. 

The increased stability seen with COREl and C0RE2 
results from improved van der Waals packing interactions 
and increased burial of hydrophobic groups (A 13V, T27V, 
V90I, G161M) and from replacement of unsatisfied hydro- 
gen bond donors or acceptors with hydrophobic residues 
(T27V, S55A, S79A, S85A, S184A), It should be noted that 
our design resulted in the replacement of one threonine and 
four serines, residues that do not seem to form hydrogen 
bonds in the native protein. Although the role of these T— >A 
and S-^A mutations has not been determined individually, 
the considerable improvements in the T^^^s obtained indicate 
that these mutations are beneficial for stability. 

We obtained highly stabilized variants of hGH, a result of 
considerable practical interest and potential clinical signifi- 
cance. An equipotent, but more robust, hGH molecule could 
have improved pharmacokinetics or better storage proper- 
ties or be more amenable to use in alternative delivery sys- 
tems and formulations, thus providing added convenience 
and improved patient compliance. Also, the large increase 
in Tj„ (^16°C) shows the utility of our optimized energy 
function in automated protein design. 

Materials and methods 

Entropy term 

We used the side-chain entropy scale taken from Pickett and Stem- 
berg (1993) and the backbone enu-opy scale from Stites and 
Pranata (1995). Both scales were derived by analyzing the distri- 
bution of side-chain rotamers and backbone angles in crystal sUmc- 
tures. We assume that all the entropy is lost on folding, because all 
the designed residues in the current work are mostly buried in the 
core of the protein. Therefore, our entropy scale is obtained by 
summing up the side-chain entropy scale and the backbone entropy 
scale, corrected for the glycine backbone entropy loss taken from 
D' Aquino et al. (1996; Table 1). The correction does not influence 
the ranking of the designed sequences, because it only results in a 
constant offset. The following parameters were used in the calcu- 
lations for optimization of the entropy weighting factor: distance- 
dependent electrostatic term with e = 40R (R is the interatomic 
distance), penalty for polar hydrogen burial of 2 kcal/mole, and 
surface-based nonpolar exposure penalty and nonpolar burial ben- 
efit of 0.0232 kcal/mole/A^. The following amino acids were al- 
lowed at the designed positions: Ala, Val, Phe, He, Leu, Tyr, Trp, 
Met, and Ser. 

Weighting factor for the electrostatic term 

The following parameters were used: entropy term with weighting 
factor of 2.3, penalty for polar hydrogen burial of 2 kcal/mole, and 
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surface-based nonpolar exposure penalty and nonpolar burial ben- 
efit of 0.048 kcal/mole/A^. The following amino acids were al- 
lowed at the designed positions: Ala, Val, Leu, He, Phe, Tyr, Trp, 
Asp, Asn, Glu, Gin, Lys, Ser, Thr, His, Hsp, Arg, and Met. 

Weighting factor for the polar hydrogen burial penalty 

The following parameters were used: entropy term with weighting 
factor of 2.3, dielectric constant of 10.3 R, and surface-based non- 
polar exposure penalty and nonpolar burial benefit of 0,048 kcaJ/ 
mole/A^. The amino acids allowed at the designed positions were 
the same as for the optimization of the electrostatics term. 

Computational design 

The crystal structure of hOH (Brookhaven Protein Data Bank code 
3HHR) was used as the starting point. The program BIOGRAF 
(Molecular Simulations Inc.) was used to generate hydrogens on 
the structure and to minimize it (50 steps of conjugate gradient 
minimization with the Dreiding 11 force field; Mayo et.aL 1990)... 
Residues were classified as core, surface, or boundary using the 
RESCLASS program (Dahiyat and Mayo 1997a). The parameters 
not specified in the Results section are described in other work 
(Dahiyat and Mayo 1996, 1997a), An expanded version (Dahiyat 
and Mayo 1996) of the backbone-dependent rotamer library of 
Dunbrack and Karplus (1993) was used in all the calculations. 

Cloning and expression 

A gene for hOH was synthesized from partially overlapping oli- 
gonucleotides (-100 bases) that were extended and PGR amplified. 
Codon usage was optimized for Escherichia coli, and several re- 
striction sites were incorporated to ease future cloning. These par- 
tial genes were cloned into a vector and transformed into E. coli for 
sequencing. Several of these gene fragments were then cloned into 
adjacent positions in an expression vector (pET17 or pET21) to 
form the full-length gene for hGH and transformed into £. coli for 
expression. Protein was expressed in E, coli in insoluble inclusion 
bodies, and its identity was confirmed by immunoblot of SDS- 
PAGE using a commercial mAb against hGH (Santa Cruz Bio- 
technology). 

Refolding 

The protein inclusion bodies were dissolved and washed consec- 
utively using wash buffer A (100 raM Tris at pH 8, 2% Triton, 4 
M urea, 5 mM EDTA, 0.5 mM DTT) and wash buffer B (100 mM 
Tris at pH 8, 0.5 mM DTT), and the solvents were removed by 
centrifuging at 20,000^ for 30 min. The pellet was resuspended 
with extraction buffer (50 mM glycine, 0.0156 M NaOH, 5 mM 
glutathione reduced, 8 M GdnHCI at pH 9.6). The supernatant was 
dialyzed for 12 to 16 h against folding buffer A (50 mM glycine, 
0.0156 M NaOH, 10% sucrose, I mM EDTA, 1 mM glutathione 
reduced, 0.1 mM oxidized glutathione, 4 M urea at pH 9.6). The 
supemant was dialyzed for 6 to 8 h in buffer B (60 mM Tris, 10% 
sucrose, 1 mM EDTA, 0.1 mM reduced glutathione, 0.01 mM 
oxidized glutathione at pH 9.6). 

Purification 

A size exclusion column (10 mm x 300 mm loaded with Superdex 
prep 75 resin purchased from Pharmacia) was loaded with protein 



and eluted at a flow rate of 0.8 mL/min using the column buffer 
(100 mM Na2S04, 50 mM Tris at pH 7.5). The peaks were moni- 
tored at dual wavelengths of 214 and 280 nm. Albumin, carbonic 
anhydrate, cytochrome C, and aprotinin were used to calibrate the 
molecular size of proteins versus elution time. The monomeric 
peak that elutes around the expected elution time for each protein 
was collected for biophysical characterization. The proteins were 
>98% pure as judged by reversed-phase high performance liquid 
chromatography on a C4 column (3.9 mm x 150 mm), with a linear 
acetonitrile-water gradient containing 0.1% TFE. The identities of 
all proteins were confirmed by comparing the molecular mass 
measured by mass spectrometry with the corresponding molecular 
mass calculated using the protein sequences. 



Spectroscopic characterization 

Protein samples were 50 pM in 50 mM sodium phosphate (pH 
5.5), Concen Orations were determined using ultraviolet spectropho- 
tometry. Protein structure was assessed by circular dichroism. Cir- 
cular dichroism spectra were measured on an Aviv 202DS spec- 
trometer equipped with a Peltier temperature control unit using a 
1-mm path length cell. Thermal stability was assessed by moni- 
toring the temperature dependence of the circular dichroism signal 
at 222 nm. The data were collected every 2.5°C, with an averaging 
time of 5 sec and an equilibration time of 3 min. The T^' of each 
protein was derived from the derivative curve of the ellipticity at 
222 nm versus temperature. T^ values were reproducible to within 
2°C for the same protein at the concentrations used. 



Cell proliferation assay 

Cell proliferation assays were performed using an interleukin 
3-dependent murine proB cell line, BAF/B03, stably transfected 
with the full-length human growth hormone receptor (Behncken et 
al. 1997) according to the method of Rowlinson et al. (1995, 
1996). Cells were maintained in RPMI-1640 medium with 5% 
fetal calf serum (PCS), 1 ^jig/mL gentamicin, and 50 units/mL 
interleukin 3. In preparation for the assay, exponentially growing 
cells were washed twice in PBS and resuspended in hGH-free and 
phenol red-free RPMI-1640 media with 5% PCS and 1 ^g/mL 
gentamicin. Serial diluted hGH was then added to 96-well microli- 
ter plates containing 2,5 x 10"^ cells/well. After 24 h of incubation 
at 37°C in 5% CO2, cell proliferation was quantified using the 
MTT assay. In each assay, the wild type and all three of the 
designed variants of hGH were tested in triplicate on the same 
plate. The entire assay was repeated three times. EC50 values were 
determined using KaleidaGraph (Synergy Software) by nonlinear 
least-squares fit of sigmoidal parts of the averaged curves to a four 
parameter equation: 

OD = 0D„„ - (OD^,, - 0D^)/(] + ([hGHj/ECso)") 

as performed by Young et al. (1997). 
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Abstract 

Granulocyte-colony stimulating factor (G-CSF) is used worldwide to prevent neutropenia caused by high- 
dose chemotherapy. It has limited stability, strict formulation and storage requirements, and because of poor 
oral absorption must be administered by injection (typically daily). Thus, there is significant interest in 
developing analogs with improved pharmacological properties. We used our ultrahigh throughput compu- 
tational screening method to improve the physicochemical characteristics of G-CSF. Improving these 
properties can make a molecule more robust, enhance its shelf life, or make it more amenable to alternate 
delivery systems and formulations. It can also affect clinically important features such as pharmacokinetics. 
Residues in the buried core were selected for optimization to minimize changes to the surface, thereby 
maintaining the active site and limiting the designed protein's potential for antigenicity. Using a structure 
that was homology modeled from bovine G-CSF, core designs of 25-34 residues were completed, corre- 
sponding to 10*^^-10^^ sequences screened. The optimal sequence from each design was selected for 
biophysical characterization and experimental testing; each had 10-14 mutations. The designed proteins 
showed enhanced thermal stabilities of up to 13°C, displayed five- to lO-fold improvements in shelf life, and 
were biologically active in cell proliferation assays and in a neutropenic mouse model. Pharmacokinetic 
studies in monkeys showed that subcutaneous injection of the designed analogs results in greater systemic 
exposure, probably attributable to improved absorption from the subcutaneous compartment. These results 
show that our computational method can be used to develop improved pharmaceuticals and illustrate its 
utility as a powerful protein design tool. 

Keywords: Protein design; computational screen; stability; cytokines; granulocyte-colony stimulating 
factor 



Many techniques have been used in the design of new and 
improved proteins. In vitro directed evolution methods such 
as phage display, DNA shuffling, and error-prone PCR are 
widely used. Rational design approaches continue to be ap- 
plied, and strategies that combine both are now being used. 
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Successful designs include enzymes (Chen and Arnold 
1991; Stemmer 1994; Zhao et al. 1998) and other proteins 
(Crameri et al. 1996), as well as therapeutically useful pro- 
teins such as hormones and cytokines (Lowman and Wells 
1993; Heikoop et al 1997; Grossmann et al. 1998; Chang et 
al. 1999). The experimental techniques involve the genera- 
tion and screening of libraries of random protein sequences. 
However, the number of sequences that can be screened ex- 
perimentally is Umited (about 10'"* for library panning and 10^ 
for high throughput screening). Libraries of this size allow for 
the simultaneous modification of only about 10 residues. 
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Computational methods have also been used that perform 
in silico screening of protein sequences (Hellinga and 
Richards 1994; Desjarlais and Handel 1995; Dahiyat and 
Mayo 1996. 1997a; Street and Mayo 1999; Jiang et al. 2000; 
Kraemer-Pecore et al. 2001; Pokala and Handel 2001). Ex- 
ploiting the efficiency and speed of computers, these meth- 
ods can randomly screen a vast number of sequences (up to 
10^°), allowing for the simultaneous consideration and 
modification of more than 60 residues. Searching such large 
sequence spaces drastically improves the possibility of find- 
ing novel protein sequences with improved properties. 

Investigators have recently developed a computational 
screening method that finds the optimal sequence for a de- 
fined three-dimensional structure, allowing all or part of the 
sequence to change (Dahiyat and Mayo 1996). This method, 
termed Protein Design Automation (PDA), scores the fit of 
sequences to the thiee-dimensional structure using physical- 
chemical potential functions that model the energetic inter- 
actions of protein atoms, including steric, solvation, and 
electrostatic interactions. PDA couples these potential func- 
dons with a highly efficient search algorithm to accurately 
screen up to 10*^° sequences. Because the screening is per- 
formed in silico,' multiple simultaneous mutations can be 
made, and novel sequences that are very different from wild 
type can be discovered. The method has been validated by 
numerous experimental tests and has resulted in the design 
of new proteins with improved stability and conformational 
specificity, and novel activity (Dahiyat and Mayo 1996, 
1997a; Malakauskas and Mayo 1998; Strop and Mayo 1999; 
Shimaoka et al. 2000; Bolon and Mayo 2001; Marshall and 
Mayo 2001). 

PDA also has the advantage of being able to control the 
location and type of mutations. For example, the design can 
be limited to the hydrophobic core. Mutations in the core 
can produce significant improvements in protein stability 
but do not change binding epitopes on the surface of the 
molecule. Thus, the molecular surface can be kept identical 
to the native structure, retaining biological activity and lim- 
iting toxicity and antigenicity. This feature is particularly 
important in the design of therapeutic proteins. 

We wanted to take advantage of these features of PDA 
and explore its utility in the design of improved phanna- 
ceuticals. We therefore used PDA as an ultrahigh through- 
put screen for improved analogs of a therapeutic protein, 
granulocyte-colony stimulating factor (G-CSF). G-CSF is a 
hematopoietic growth factor of 174 residues that induces 
differentiation and proliferation of granulocyte-committed 
progenitor cells. It is used clinically to treat cancer patients 
and alleviate the neutropenia induced by high-dose chemo- 
therapy. G-CSF belongs to the class of long-chain four- 
helix bundle cytokines that bind asymmetrically to homodi- 
meric complexes of cell-surface receptors to initiate an in- 
tracellular signaling cascade. Their structural similarity 
allows the design strategy chosen for G-CSF to be imme- 



diately applicable to the other four-helix bundle cytokines 
(human growth hormone, erythropoietin, the interleukins, 
and interferon-a/p — all clinically important compounds) 
and thus broadens the potential impact of the results. 

Although the cytokines are functionally very efficacious, 
their pharmacological properties are not ideal. For example, 
G-CSF, like most proteins, is not absorbed orally to any 
significant extent and must be administered by frequent 
(daily) injections throughout the course of treatment, it also 
has limited stability and strict formulation and storage re- 
quirements, including the need to be kept refrigerated. Thus, 
there is significant interest in developing analogs with im- 
proved pharmacological properties. 

We sought to use PDA to improve the physicochemical 
characteristics of G-CSF. Improving these properties can 
make a molecule more robust, enhance its shelf life, or 
make it more amenable to use in alternate delivery systems 
and formulations. It can also affect clinically important fea- 
tures such as pharmacokinetics and result in a drug that is 
safer for human use. Our design strategy was to optimize the 
core to improve the stability and solution properties of 
G-CSF while preserving receptor binding and biological 
activity. 

The template structure used for in silico screening was a 
homology model of human G-CSF in which the human 
sequence was mapped onto bovine G-CSF. We designed 
several novel core sequences, cloned and expressed them, 
characterized their stabilities, tested them for functional ac- 
tivity both in vitro and in vivo, and studied their pharma- 
cokinetics in monkeys. The designed proteins showed en- 
hanced thermal stabilities, displayed five- to 10-fold im- 
provements in shelf life, and were biologically active both 
in cell proliferation assays and in a neutropenic mouse 
model. Subcutaneous injection of the most stable variant in 
monkeys also resulted in greater systemic exposure, prob- 
ably attributable to improved absorption from the subcuta- 
neous compartment. These results indicate that PDA has 
great potential as a powerful in silico tool in the design of 
improved pharmaceutical proteins. 

Results and Discussion 

Homology modeling 

The crystal structure of bovine G-CSF (PDB record Ibgc) 
(Lovejoy et al. 1993) was used as the staning point for 
modeling because the crystal structure of human G-CSF 
(PDB record 1 rhg) (Hill et al. 1993) is at a lower resolution 
and is missing key fragments, including a structurally im- 
portant disulfide bond between positions 64 and 74. Bovine 
G-CSF is a good model for human G-CSF because the 
sequences are the same length and 142 of 174 amino acids 
are identical (82%). The residues tiiat differ in the bovine 
sequence were replaced with the human residues for those 
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positions, ajid the conformations of the replaced side chains 
were optimized using PDA. Most of the replaced residues 
were solvent exposed, thereby inaoducing little strain into 
the structure and allowing typical PDA parameters to be 
used for conformation optimization. One substitution, how- 
ever, was at a buried site, G167V, and clashed sterically 
with a nearby disulfide bond. To accommodate the larger 
Val, the side-chain confonnation at this position was opti- 
mized using a less restrictive van der Waals scale factor (0.6 
instead of 0.9). The entire structure was then briefly mini- 
mized to relax the strain. The final structure that ser\'ed as 
the template for all the designs is shown in Figure 1 . 

Core designs 

Unlike many experimental sequence screening methods, 
PDA allows control over which residues are allowed to 




ggS Resiciues tdsntlcal In bovine and human sequences (82%) 
SSI Residues that tfifTer; replaced by human residues 
■3B Fragments missing In human crystal structure 
i \ Side chains not vfewablo in crystal structure; replaced 
by wild type side chains using PDA"" 
Val at position 167 dashes with adjacent disulfide 

& Hoi spot residues believed to be Important for cranulopoletJc 
activity 

Fig. 1. Template structure of hG-CSF used for Protein Design Automation 
(PDA) designs. The human sequence was homology modeled onto the 
bovine crystal structure (PDB record 1 bgc). The residues that differ in the 
bovine sequence or were not present in the bovine crystal structure were 
replaced with the residues from the human sequence. The conformations of 
the replaced side chains were optimized using PDA (the larger Val at 
position 167 was optimized using a less restrictive van der Waals scale 
factor), and the entire structure was energy minimized for 50 steps. 



change. Core residues were selected because optimization 
of these positions can improve stability yet minimize 
changes to the molecular surface, thus limiting the designed 
protein's potential for antigenicity. Ala scanning studies of 
G-CSF indicate one or two binding sites on the protein 
surface that are probably responsible for granulopoietic ac- 
tivity (Reidhaai-Olson et al. 1996; Young et al. 1997) (Fig. 
1). Although recent crystallographic studies of G-CSF com- 
plexed to its receptor show only one binding site in a novel 
2:2 complex (Horan et al. 1996; Aritomi et al. 1999), both 
sites were avoided in the core designs to ensure preservation 
of function. 

Two PDA design calculations were run: a deep core de- 
sign that included residues deeply buried in the interior of 
the protein and an expanded core design (exp_core)'that 
also included less buried peripheral core residues. The deep 
core design had 26'Core positions -that were allowed to vary 
(shown yellow and gold in Fig. 2), whereas exp_core had 34 
(shown yellow and turquoise in Fig. 2). Only hydrophobic 
amino acids were considered at the variable core positions. 
These included Ala, Val, He, Leu, Phe, Tyr, and Trp. Gly 
was also allowed for the variable positions that had Gly in 
the bovine wild-type structure (positions 28, 149, 150, and 
167). Met and Pro were not allowed. 

Optimal sequences 

The optimal sequences selected by PDA are also shown in 
Figure 2. The optimal sequence from the deep core design 
had 10 mutations (named core 10), and the optimal exp_core 
sequence had 11 (named exp_corell); thus, 33%-38% of 
the variable residues changed their identities. Eight of the 
mutated positions changed to the same amino acid in both 
designs. Changing the set of design positions can signifi- 
cantly impact the amino acid selected at a given position. 
For example, in the deep core design, Leu89 retains the 
same amino-acid identity and conformation as wild type. 
However, in the exp_core design, when Leu92 is also al- 
lowed to var>% both positions (Leu89 and Leu92) mutate to 
Phe, indicating a coupling between these two core residues. 
The modeled structure of the sequence selected in the deep 
core design (corelO) is shown in Figure 3. 

Native human G-CSF (met hG-CSF) and the optimal se- 
quence from each of the core designs were cloned, ex- 
pressed in Escherichia coli, and purified for experimental 
studies. 

Thermal stability 

The far- ultraviolet (UV) circular dichroism (CD) spectra for 
met hG-CSF and the designed proteins were nearly identical 
to each other and to published spectra for met hG-CSF 
(Reidhaar-Olson et al. 1996; Young et al. 1997), indicating 
highly similar secondary structure and tertiary folds (data 
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Fig. 2. Sequences of hG-CSF analogs. Native human and bovine sequences are shown at the top. The fragments missing in the crystal 
structure of the human sequence are shown boxed. Variable positions are colored. The deep core design had 26 variable positions, 
exp_core had 34, and core 167V had 25. The optimal sequence from each design is shown. Letters indicate core residues that mutated 
relative to native hG-CSF; blanJts indicate no change. Positions that changed to the same amino acid in all three core designs are 
indicated in bold. Core2 and coreS sequences were not obtained from PDA calculations but were derived by reverting some of the 
corelO mutations to wild type. Melting temperatures (J,„s) obtained for the designed proteins are also shown. 



not shown). Thermal denaturation was monitored at 222 
nm, and the melting temperatures (T^^s) were derived from 
the derivative curve of the ellipticity at 222 nm versus tem- 
perature (Fig. 4). Thermal denaturation of G-CSF and its 
variants is irreversible; however, Tj„ can be used to quickly 
assess the relative stability of different mutants. Stability 
under storage conditions, which is more relevant clinically, 
was evaluated with shelf-life studies (see below). 

The T^ for met hG-CSF was 60°C, identical to that re- 
ported in other studies (Kolvenbach et al. 1997). CorelO 
showed an increase in stability of 1 3°C, whereas the T^ of 
exp_corel 1 was very similar to wild type (Fig. 2 and Fig. 4). 
The increased stability seen with corelO may be attributable 
to improved packing interactions and optimized hydropho- 
bic burial of side -chains. Other possibilities include de- 
creased aggregation resulting from elimination of the free 



cysteine at position 17. The Gly to Ala mutation at position 
28 caused a significant improvement in helical propensity 
that could also be the source of the improved stability. 

Identifying critical mutations using derived sequences 

To differentiate between these possibilities, two additional 
sequences derived from the corelO mutant sequence were 
made and their Tj„s measured. One of these (coreS) was 
identical to corelO except that two mutations distant from 
the others were reverted to wild type (LI 03V and VI 101). 
These were the two positions that did not mutate in 
exp_corel 1, The T^ of coreS was 70°C, similar to corelO, 
indicating that the mutations at 103 and 110 were not re- 
sponsible for corelO's improved stabiUty. 
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CZJ Variable residues (26) 
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Fig. 3. Modeled structure of hG-CSF analog (core 10) obtained from deep 
core design. Twenty-six core residues were allowed to vary; computational 
screening with PDA resulted in 10 mutations: C17L, G28A, L78F, Y85F, 
LI03V, VllOI, Fl 13L, V151I. V153I, and L168R 

To determine the importance of the other mutations, an- 
other sequence was made (core2) that contained only two of 
the Corel 0 mutations, G28A and C17A; all other residues 



were identical to wild type (Fig. 2). The T^^ of core2 was 
5°C higher than wild type, indicating that improvements in 
helical propensity and the elimination of a free cysteine are 
important for heightened thermostability. The remainder of 
the increase in T^^^ seen for core 10 may be attributable to 
improved packing interactions and increased hydrophobic 
burial. 



Storage stability 

Increased shelf life is important for distribution and storage 
and is a desirable feature for G-CSF and other protein drugs. 
Because aggregation and chemical degradation are the pre- 
dominant mechanisms of inactivation of G-CSF (Herman et 
al. 1996), shelf life was estimated by incubating the proteins 
at elevated temperature and then using size-exclusion chro- 
matography to observe the disappearance of monomeric 
protein. Chemical degradation was estimated using reverse 
phase chromatography (data not shown). Core2 and core 10 
showed five- and 10-fold improvements in storage stability, 
respectively, at 50°C (Fig. 5). Rate constants were deter- 
mined by a first order exponential fit of the fraction mono- 
mer remaining/time curves using KaleidaGraph (Synergy 
Software). 

Biological activity 

Granulopoietic activity was determined in vitro by quanti- 
tating cell proliferation as a function of protein concentra- 
tion in murine lymphoid cells transfecied with the gene for 
the human G-CSF receptor. The designed proteins were as 
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Fig. 4. Thermal stability of hG-CSF analogs. Thermal stability was as- 
sessed by monitoring the temperature dependence of the circular dichroism 
spectral signal at 222 nm. Melting temperatures (T^) were derived from 
the derivative curve of the elliplicity at 222 nm versus temperature. CortlO 
and core2 showed increases in T„ of 13°C and 5°C, respectively, over 
native met hG-CSF. 
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Fig. 5. Shelf life of hG-CSF analogs. Shelf life was estimated by incubat- 
ing the proteins at elevated temperature (50X) and using size exclusion 
chromatography to observe disappearance of monomeric protein. Rate con- 
stants were determined by a first order exponential fit of the fraction 
monomer remaining/time curves. Core2 and core 10 showed five- and 10- 
fold improvements in storage stability, respectively, over met hG-CSF 
controls. 
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Fig. 6. In vivo granulopoietic aclivity of hG-CSF analogs. Mice were 
rendered neutropenic with a single iniraperiloneal injection of 200 mg/kg 
cyclophosphamide (CPA). Beginning 24 h later and for 4 consecutive days, 
the mice were given a daily intravenous injection of 100 p-g/itg of native 
hG-CSF (filgrastim, Amgen), an hG-CSF analog, or saline. On day 5, 
granulopoietic aclivity „was detennined by coimting the number of white 
blood cells and polymorphonuclear neutrophils (PMN). The designed ana- 
logs (cores and corelO) were as effective as controls in eliciting a granu- 
lopoietic response. 

active as wild-type hG-CSF (data not shown). The designed 
analogs were also as effective as wild type in increasing 
white blood cell and polymorphonuclear neutrophil levels in 
the neutropenic mouse (Fig. 6). Neutropenia, characterized 
by an abnormally low level of neutrophils in the blood, was 
induced by injection of cyclophosphamide. Reversal of this 
effect by 'the designed analogs shows that granulopoietic 
activity was also retained in vivo. 

Pharmacokinetics 

The pharmacokinetics of corelO and native hG-CSF (fil- 
grastim, Amgen) was studied in cynomolgus monkeys after 
a single subcutaneous or intravenous injection of 5 |xg/kg 
and after daily subcutaneous injections of 5 |xg/kg for 28 d. 
Analysis of the serum concentration-time curves shows that 
subcutaneous injection of the designed analog results in 
greater systemic exposure (area under concentration-time 
curve, AUC) than the same dose of wild-type hG-CSF (Fig. 
7B). This was true after a single dose on day 1 (78.8 vs. 54.6 
ng-h/mL, data not shown), as well as on day 28 (37.2 vs. 
17.4 ng-h/mL). There were no measurable differences in 
serum half-life. In the intravenous study, however, the half- 
life of corelO was three-fold shorter (1 vs. 3 h), and the 
AUC was significantly less (54.7 vs. 1 17.4 ng-h/mL), indi- 
cating that corelO is cleared faster (Fig. 7A). Taken to- 
gether, these data indicate that the designed analog is ab- 
sorbed more quickly from the subcutaneous compartment 
(absorption could not be measured directly given the small 
number of data points at early times). Improved absorption 
may be attributable to decreased aggregation or association 
of the designed protein. The increased monomer lifetime 
and decreased aggregation seen in our shelf-life studies and 



the improved thermal stability of the native conformation 
observed for corelO indicate a decrease in aggregation in 
the subcutaneous compartment. This possibility is sup- 
ported by the fact that other protein therapeutics engineered 
for reduced aggregation also show faster absorption rates. 
For example, insulin Lispro and other rapid-acting insulin 
analogs that were designed to decrease their tendency to 
self-associate are absorbed faster than regular insulin after 
subcutaneous injection (Howey et al. 1994; Home et al. 
1999).' 
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Fig. 7. Pharmacokinetics of hG-CSF analogs. Plasma concentrations of a 
designed hG-CSF analog or wild-type hG-CSF (filgrastim, Amgen) were 
determined after administration in cynomolgus monkeys. {A) Animals were 
given a single intravenous injection of 5 jig/kg or (fi) daily subcutaneous 
injections of 5 ^ig/kg for 28 d. Noncompartmental analysis of the senim 
concentration-time curves shows thai subcutaneous injections of the corelO 
analog resulted in greater systemic exposure (area under concemralion- 
lime curve, AUC) than the same dose of wild-type hG-CSF, whereas there 
was no change in serum half-life (t,^). In the intravenous study, the AUC 
was significantly less and the t,^ Ihree-fold shorter, indicating that corelO 
was cleared faster 
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Comparison to published G-CSF variants 

In vitro and cassette mutagenesis studies have shown that 
alterations of the N-terminal region of G-CSF can lead to 
improved granulopoietic activity (Kuga et aL 1989; Okabe 
et al. 1990), Point mutations at Cysl7 have also been found 
to affect shelf life; replacement with Ala led to an increase, 
Ser had no effect, and large residues (lie, Tyr, Arg) led to a 
decrease (Ishikawa et al. 1992). In contrast, our core 10 se- 
quence, which has a large residue (Leu) at this position, 
showed an improved shelf hfe. This may be explained by 
the observation that in a Cysl7Leu point mutant, Leu's side 
chain would clash with the aromatic ring of the nearby Phe 
at position 113. This steric clash does not occur in corelO, 
however, because the Phe at 1 13 is replaced by Leu and, in 
compensation for this change, two nearby Leu's become 
.Phf?*s (at positions 78 and 168). Thus, multiple mutations 
allow complementary repacking of the hydrophobic core in 
the core 10 mutant and may be responsible for its enhanced 
stability and shelf life. 

Significant improvements in thermal stability were also 
observed when the seven helical Gly residues in G-CSF 
were replaced with Ala to form point, double, and triple 
mutants (Bishop et al. 2001). Substitutions at positions 26, 
28, 149, and 150 were the most effective. The investigators 
attributed the stabilizing effect to the enhancement in a-he- 
lical propensity associated with the Gly/Ala substitutions. 
These data support our suggestion that the heightened ther- 
mal stability seen with our mutants (which also contain a 
Gly/Ala substitution at position 28) is at least in part attrib- 
utable to an improvement in helical propensity. 

Probing the robustness of PDA with 
a homology modeled core position 

As pointed out previously, the homology modeling of hu- 
man G-CSF onto the bovine stnvcture was straightforward 
for the most part because the replaced residues were prima- 
rily solvent exposed and no rearrangement of the backbone 
was necessary. The change at one core position, however, 
G167V, induced a steric clash and energy minimization of 
the entire protein was used to relieve the strain. We decided 
to assess the impact of this manipulation by doing an addi- 
tional design (core 167V) in which the variable residues 
were essentially the same as in the deep core design except 
that position 167 was also allowed to vary. We found that 
Vail 67 mutated to Ala (the other mutations were essentially 
the same as for core 10). To probe the plasticity of the core, 
instead of using this PDA optimal sequence, which only had 
two mutations in this region, we ran experiments on another 
high-scoring sequence (corel4_V167A) that had additional 
mutations (14 total, including L157I, F160W, and L161F). 
This sequence was chosen because it balanced an extensive 
number of mutations with a relatively high design score. 



Although it ranked 21st in the sequence energy list and was 
2 kcal/mole less favorable than the optimal sequence, it was 
still biologically active and as stable as wild type (T^^ of 
61°C) (Figs. 2, 4). This indicates that optimization with 
PDA is fairly robust, and that the protein core can be quite 
plastic and can accommodate laige changes without sacri- 
ficing stability or function. 

Conclusions 

PDA is a powerful ultrahigh throughput computational 
screening method. Its ability to screen up to 10^° sequences 
and allow multiple simultaneous mutations significantly in- 
creases the likelihood of finding new and improved pro- 
teins. In this study, PDA was used to develop improved 
analogs for a therapeutically .important protein, hG-CSF, 
The novel proteins showed enhanced thennal stabilities and 
shelf life while retaining biological activity. Analysis of the 
mutants and results obtained with derived sequences indi- 
cates that the heightened stability is attributable to improve- 
ments in helical propensity and the elimination of a free 
cysteine; improved core packing and optimized hydropho- 
bic burial of side chains may also be important. Pharmaco- 
kinetic studies indicate that subcutaneous injection of the 
most stable variant results in greater systemic exposure, 
probably attributable to improved absorption from the sub- 
cutaneous compartment. 

These results show that PDA can be successfully applied 
to proteins of therapeutic interest. They also illustrate the 
value of its precise control over the site and type of muta- 
tions, allowing for the rational design of desired properties 
such as improved stability and pharmacokinetics and the 
elimination of undesirable ones such as toxicity and antige- 
nicity. These features are particularly important in the de- 
sign of therapeutic proteins. PDA thus has great potential as 
a powerful in silico tool for therapeutic protein design. 

Materials and methods 

Template structure preparation 

The template structure for the designed proteins was produced by 
homology modeling using the crystal structure of bovine G-CSF 
(Brookhaven Protein Data Bank code Ibgc) as the starting point. 
Tht program BIOGRAF (Molecular Simulations Inc., San Diego, 
CA) was used to generate explicit hydrogens on the structure, 
which was then minimized for 50 steps using the conjugate gra- 
dient method and the Dreiding n force field (Mayo et al. 1990), 
The residues that differ in the bovine sequence or were not present 
in the bovine crystal structure were replaced with the human resi- 
dues for those positions. The conformations of the replaced side 
chains were optimized using PDA (Dahiyat and Mayo 1997a,b), 
and the entire sUiicture was minimized again for 50 steps. This 
minimized structure was used as the template for all the designs. 
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Protein design 

Analogs of hG-CSF were designed by simultaneously optimizing 
residues in the buried core of the protein using PDA. The compu- 
tational details, residue classification, potential functions, and pa- 
rameters used for van der Waals interactions, solvation, and hy- 
drogen bonding are described in previous work (Dahiyat and Mayo 
1996, 1997a). An expanded version of the backbone-dependent 
rotamer library of Dunbrack and Karplus (Dunbrack and Karplus 
1993) was used in all the calculations. The global optimum se- 
quence from each design was selected for characterization and 
experimental testing, except for core 167 V in which the 21st ranked 
sequence was used. Calculations were generally performed over- 
night using 16 processors of an SGI Origin 2000 with 32 R 10000 
processors running at 195 MHz. The length of the runs varied from 
1 to several hours of CPU time. 



Cloning and expression 

A gene'fornret hG-CSF was synthesized from partially overlap- 
ping oligonucleotides (-100 bases) that were extended and PGR 
amplified. Codon usage was optimized for coH and several 
restriction sites were incorporated to ease future cloning. These 
partial genes were cloned into a vector and transformed into E, coli 
for sequencing. Several of these gene fragments were then cloned 
into adjacent positions in an expression vector (pET17 or pET21) 
to form tlie full-length gene for met hG-CSF (528 bases) and 
transformed into £. coli for expression. Protein was expressed in £. 
coli in insoluble inclusion bodies and its identity was confirmed by 
immunoblot of SDS-PAGE using a commercial mAb against 
hG-CSF. 



Refolding, purification, and storage 

The protein inclusion bodies were solubilized in detergent and 
refolded in the presence of CUSO4 to promote formation of native 
disulfide bonds (Lu et al. 1992). A size-exclusion column (10 
mm X 300'mm loaded with Superdex prep 75 resin purchased from 
Pharmacia) was loaded with protein and eluted at a flow rate of 0.8 
mL/min using the column buffer (100 mM Na2S04, 50 mM Tris, 
pH 7.5). The peaks were monitored at dual wavelengths of 214 nm 
and 280 nm. Albumin, carbonic an hydrate, cytochrome C, and 
aprotinin were used to calibrate the molecular size of proteins 
versus elution time. The monomeric peak that elutes around the 
expected elution time for each protein was collected and the buffer 
was exchanged into 10 mM NaOAc at pH 4 for biophysical char- 
acterization. For long-term storage, a buffer of 5% sorbitol, 
0.004% Tween 80, and 10 mM NaOAc at pH 4 was used. A pH of 
4 was chosen for these buffers to be consistent with the commer- 
cial formulation of hG-CSF (Amgen), which was used as a control. 
The proteins were >98% pure as judged by reversed phase high 
performance liquid chromatography (H PLC) on a C4 column (3.9 
mm X 150 mm) with a linear acetonitrile -water gradient containing 
0.1% TFE. The identities of all proteins were confirmed by com- 
paring the molecular mass measured by mass spectrometry with 
corresponding molecular mass calculated using the protein se- 
quences. 



Spectroscopic characterization 

Protein samples were 50 p-M in 50 mM sodium phosphate at pH 
5.5. Concentrations were determined using UV spectrophotometry. 
Protein structure was assessed by CD. CD spectra were measured 



on an Aviv 202DS spectrometer equipped with a Peltier tempera- 
ture control unit using a 1-mm path length cell. Thermal stability 
was assessed by monitoring the temperature dependence of the CD 
signal at 222 nm (Kolvenbach et al. 1997). A buffer of 10 mM 
NaOAc was used at pH 4.0 and data were collected every 2.5 °C 
with an averaging time of 5 sec and an equilibration time of 3 min. 
Thermal denaturation curves were smoothed using KaleidaGraph. 
The melting temperature (T^) of each protein was derived from 
the derivative curve of the ellipticity at 222 nm versus temperature. 
The T„j values were reproducible to within 2°C for the same pro- 
tein at the concentrations used. 



Storage stability 

The storage stability of the designed proteins was assessed by 
incubation at both 37°C and 50°C under solution conditions iden- 
tical to that used in the commercial formulation of hG-CSF (fil- 
grastim, Amgen). Because aggregation and chemical degradation 
are the predominant mechanisms of inactivation of G-CSF (Her- 
man et al. 1996), accelerated degradation was followed by observ- 
ing the disappearance of monomeric protein with both size-exclu- 
sion and reverse-phase chromatography. Rale constants for shelf- 
life estimation were determined by a first-order exponential fit of 
the fraction monomer remaining/time curves using KaleidaGraph 
(Synergy Software). 



Cell proliferation assay 

Granulopoietic activity was measured by quantifying cell prolif- 
eration as a function of protein concentration using Ba/F3 (murine 
lymphoid) cells stably transfected with the gene encoding the hu- 
man Class 1 G-CSF receptor (Avalos el al. 1995). Cell prolifera- 
tion was detected by 5-bromo-2'-deoxyuridine (BrdU) incorpora- 
tion quantified by a BrdU-specific ELISA kit (Boehringer Mann- 
heim). 



In vivo biological activity 

Granulopoietic activity was determined in the neutropenic mouse 
(Hattori et al. 1990). C57BLy6 mice were rendered neutropenic 
with a single intraperitoneal injection of 200 mg/kg cyclophospha- 
mide (CPA). Beginning 24 h later and for 4 consecutive days, the 
mice were given a daily intravenous injection of 1(X) fxg/kg of an 
hG-CSF analog, met hG-CSF produced in our laboratory, clini- 
cally available hG-CSF (filgrastim, Amgen), or saline. On day 5, 
6 h after the final dose, the animals were killed, blood samples 
were collected, and granulopoietic activity was determined by 
counting the number of white blood cells and polymorphonuclear 
neutrophils. 



Pharmacokinetics 

Plasma concentrations of a designed hG-CSF analog or wild-type 
hG-CSF (filgrastim, Amgen) were determined following adminis- 
tration in cynomolgus monkeys. Animals were given a single in- 
travenous injection of 5 jig/kg or daily subcutaneous injections of 
5 |xg/kg for 28 d. In the intravenous study, blood samples were 
collected at 0 (predose), 5, 15, and 30 min and 1, 2, 4, 6, 8, 12, and 
24 h postdosing. In the subcutaneous studies, blood samples were 
collected at 0 (predose), 1,-2, 4, 6, 8, 12, and 24 h postdosing on 
day 1 and day 28. All samples were immediately placed on wet ice 
and centrifuged at 28''C. The resultant plasma was then frozen and 
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stored (-70°C). Plasma concentrations were determined using an 
enzyme-linked immunosorbent assay (Quantikine human G-CSF 
ELISA, R&D Systems, Minneapolis, MN), performed per manu- 
facturers instructions except that samples were diluted in PBS, 5% 
nonfat dry milk, and 0.05% Tween 20, and the incubation was 
extended to overnight at 4°C. Plasma concentrations of the de- 
signed hG-CSF analog and filgrastim were estimated from their 
corresponding standard curves. Pharmacokinetic parameters were 
calculated by noncompartmental analysis. TTie terminal slope (\z) 
was estimated by linear regression through the last time points of 
the log concentration versus time curves and used to calculate the 
terminal half-life (I1/2). Tlie area under the curve from time of 
dosing through the last time point (AUC0.J was calculated by the 
linear trapezoid method. 
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An increasing number of engineered protein therapeutics are currently 
being developed, tested in clinical trials and marketed for use. Many of 
these proteins arose out of hit-and-miss efforts to discover specific 
mutations, fusion partners or chemical modifications that confer desired 
properties. Through these efforts, several useful strategies have 
emerged for rational optimization of therapeutic candidates. The 
controlled manipulation of the physical, chemical and biological 
properties of proteins enabled by structure-based simulation is now 
being used to refine established rational engineering approaches and 
to advance new strategies. These methods provide clear, hypothesis- 
driven routes to solve problems that plague many proteins and to 
create novel mechanisms of action. We anticipate that rational protein 
engineering will shape the field of protein therapeutics dramatically by 
improving existing products and enabling the development of novel 
therapeutic agents. 
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▼ The exquisite specificity of biological thera- 
peutics for their clinical targets has led to 
their continued development and application 
as medicines, despite competition from small 
molecule drugs. Several engineered protein 
therapeutics are currently being marketed 
(Table 1). and the annual sales of protein 
therapeutics are projected to exceed US$59 
billion in 2010. which is twice the revenue 
generated in 2001 {http://www.pharmafile. 
com/Pharmafocus/Features/feature.asp?flD= 
281) [1]. For well-validated targets, naturally 
occurring protein interaction partners consti- 
tute preselected 'lead' compounds with high 
affinity and specificity. However, because 
natural proteins are not evolved for utiliza- 
tion as drugs, lead optimization is frequently 
beneficial for development of a protein 
therapeutic. Modifications can influence the 



mechanism of action, side effects and effi- 
cacy, and satisfy practical constraints such as 
production costs, intellectual property and 
dosing frequency. 

A variety of strategies have emerged for 
modulating protein properties, such as effi- 
cacy, stability, specificity, immunogenicity 
and pharmacokinetics (PK). Mechanisms for 
altering these properties include manipula- 
tion of primary structure, incorporation of 
chemical and post-translation modifications 
and utilization of fusion partners. The most 
common route to optimization is site-directed 
mutagenesis, which is often performed in a 
brute force or trial-and-error manner. A smaller 
number of examples exist whereby semira- 
tional application of diversity methods, such 
as phage display, has been used to optimize 
a therapeutic candidate. Important recent deve- 
lopments are the creation and successful 
application of rational protein design meth- 
ods and the determination of an increasing 
number of high-resolution protein structures. 

For the purposes of this review, we define 
rational protein engineering as the hypothesis- 
driven manipulation of protein sequence 
and/or composition. The controlled modifi- 
cation of specific biophysical properties of 
proteins can potentially impact a variety of 
therapeutic features (Table 2). An important 
subset of rational engineering methods consists 
of approaches that utilize high-resolution, 
3D structure information. The most sophis- 
ticated of these methods offers an extraor- 
dinary level of control over protein sequence 
and structure, a mechanism to explore se- 
quence combinations that extends far beyond 
natural diversity, and the ability to couple 
multiple constraints algorithmically for 
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Table 1. Engineered protein therapeutics on the market" 



Name 


Family 


Company 


Indication 


Modification 


Property 


Proleukin* 


IL-2 


Chiron 


Cancer 


Mutated free cysteine 


Decreased aggregation; 


(aldesleukin) 










improved bioavailability 


Betaseron* 


IFN-P 


Berlex/Chiron 


Multiple 


Mutated free cysteine 


Decreased aggregation 


(interferon beta- lb) 






sclerosis 






Humalog* 


Insulin 


Eli Lilly 


Diabetes 


Monomer not 


Fast acting 


(insulin lispro) 








hexamer 




Novo Log* 


Insulin 


Novo Nordisk 


Diabetes 


Monomer not 


Fast acting 


(insulin aspart) 








hexamer 




Lantus* 


Insulin 


Aventis 


Diabetes 


Precipitates in dermis 


Sustained release 


(insulin glargine) 












Enbrel* 


TNF receptor 


Immunex/ 


Rheumatoid 


Fc fusion 


Longer serum half-life; 


(etanercept) 




Amgen/Wyeth 


arthritis 




increased avidity 


Ontak' 


Diptheria 


Seragen/Ligand 


Cancer 


Fusion 


Targets cancer cells 


(denileukin diftitox) 


toxin-IL-2 










PEG-lntron* 


IFN-a 


Schering-Plough 


Hepatitis 


PEGylation 


Increased serum half-life; 


(peginterferon alfa-2b) 










weaker receptor binding 


PEGasys* 


IFN-a 


Roche 


Hepatitis 


PEGylation 


Increased serum half-life; 


(peginterferon alfa-2a) 










weaker receptor binding 


Neulasta"* 


G-CSF 


Amgen 


Leukopenia 


PEGylation 


Increased serum half-life 


(pegfilgrastim) 












Oncaspar* 


Asparaginase 


Enzon 


Cancer 


rtuyiaiion 


uecreaseu 


(pegaspargase) 










immunogenicity; 
increased serum half-life 


Aranesp* 


Epo 


Amgen 


Anemia 


Additional 


Increased serum half-life; 


(darbepoetin alfa) 








glycosylation sites 


weaker receptor binding 


Somavert* 


Growth 


Genentech/ 


Acromegaly 


PEGylation; 


Novel mode of action; 


(pegvisomant) 


hormone 


Seragen/ 
Pharmacia 




binding site 
mutations 


increased serum half-life 



Chiron (http://www.chiron.com); Berlex {http://berlex.com); Ell Lilly (http://www.lilly.com); Novo Nordisk (http://www.novonordisk.com); 
Aventis (http://www.aventis.com); Immunex/Amgen (http://www.amgen.com); Wyeth (http://www.wyeth.com); Seragen/Ligand (http://www.ligand.com); 
Schering-Plough (http://www.sch-plough.com); Roche (http://www.roche.com); Enzon (http://wvwv.enzon.com); Genentech (http://www.genentech.com); 
Pharmacia (http://www.pharmacla.com). 

•Abbreviations: G-CSF, granulocyte -colony stimulating factor; IFN-a, interferon a; IL-2. interleukin 2; PEG, polyethylene glycol: TNF. tumor necrosis factor. 



simultaneous optimization of several protein properties. 
Furthermore, proven hypotheses can be reapplied 
to additional protein systems, thus saving discovery 
cost and time. Rational methods can be distinguished 
from those that rely on random sequence perturbations 
or combinations, such as the class of optimization tech- 
niques referred to as directed evolution, although some 
implementations of these methods have a rational com- 
ponent [2,3]. 

Physicochemical properties 

The physical and chemical properties of protein therapeutics 
significantly determine their performance during develop- 
ment, manufacturing and clinical use. Many therapeutically 



interesting proteins are naturally expressed at low concentra- 
tions and are degraded rapidly. By contrast, fully devel- 
oped protein therapeutics require high levels of solubility 
as well as retention of activity through purification, 
formulation, storage and administration. Several rational 
design and engineering strategies, such as those highlighted 
in Figue 1 , have been developed to improve properties such 
as solubility and stability while maintaining desired 
biological activity. 

Stability 

Protein therapeutics are exposed to a variety of stresses 
that can cause protein unfolding or degradation. Using ra- 
tional optimization methods, proteins can be re-engineered 
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Table 2. The biophysical pr perties of proteins that can be optimized to obtain desired therapeutic 
outcomes' 



Stability 
Solubility 
Receptor binding affinity 

and specificity 
MHC binding affinity 
Oligomerization state 
Chemical modifications 
Posttranslational 

modifications 
Sequence diversity 
Conformational state 



Enable Mechanism 
discovery of action 

X 
X 



Pharmaco- 
kinetics 



Immuno- Route of 
genicity administration 



Cost of 
goods 



Shelf life 

X 
X 



Intellectual 
property 



'Abbreviation: MMC, major histocompatibility complex. 



such that their structure and activity are substantially 
more robust with respect to protease exposure, oxidative 
stress and changes in temperature, pH and solution con- 
ditions. One simple stabilization strategy is to replace free 
cysteines, thereby preventing the formation of unwanted 
intermolecular or intramolecular disulfide bonds. Cysteine 
to serine mutations have been introduced successfully 
into several therapeutic proteins, including granulocyte 



Exposed 

hydrophobic 

residues 

• solubility 



Binding site 
• interaction affinity ^ 
and specificity 



Termini 

• attachment of ^ 
fusion partners 
or PEG 




Loops 

• protease 
susceptibility 

Core 

• stability and 
conformational 
control 

Linear epitopes 

• immunogenlcity 



Drug Discovery Today 



Figure 1. Examples of rational design and engineering 
strategies. Many design strategies target specific residues or 
regions of a protein structure for optimization. However, it is 
important to note tliat modifications in any region of the 
protein could potentially affect a wide range of protein 
properties, emphasizing the importance of rational design 
methods that can simultaneously consider and optimize 
multiple parameters. Abbreviation: PEG, polyethylene glycol. 



colony-stimulating factor (G-CSF) and interferon (IFN) 
plb, resulting in a longer shelf life [4,5]. Cysteine to serine 
mutations have also been shown to increase the half-life 
of human fibroblast growth factor (FGF) [6]. Interestingly, 
each of the three FGF mutations decreases the thermal 
stability of the protein, probably because the introduced 
serines are substantially desolvated and are not posi- 
tioned to form intramolecular hydrogen bonds. Rational 
approaches can identify the amino acids that are more 
precisely compatible with the local structural environ- 
ment. 

Dramatic improvements in the global stability of a 
protein can be obtained by optimizing intramolecular in- 
teractions. Early examples used rational computational 
design methods to optimize packing interactions and 
hydrophobic burial in the protein core [7-9]. Optimizing 
secondary structure propensity, hydrogen bonds and elec- 
trostatic interactions can also improve protein stability 
substantially [10-12]. More recently, these principles have 
been applied to the clinically relevant proteins G-CSF and 
human growth hormone (hGH) by using Protein Design 
Automation® (PDA™) technology (Box 1). The designed 
hGH variants are active in cell proliferation studies and are 
up to 16°C more thermostable than the wild type protein 
[13]. Optimized G-CSF variants with 10-14 mutations 
display enhanced thermal stability and five to tenfold in- 
creases in shelf life while maintaining the desired biological 
activity [14]. 

An additional stabilization strategy is reduction of proteo- 
lytic susceptibility. If a specific site in the protein is known 
to be especially prone to proteolysis, it can be modified 
so that it no longer matches the substrate specificity of 
the putative protease. The protease cleavage sites are often 
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Box 1. Xencor's PDA^ technology: a state of the art rational engineering platform 



PDA™ technology, originally developed at Caltech 
[10,66,67] and further optimized at Xericor [13,14], cou- 
ples computational design algorithms that generate quality 
sequence diversity with experimental high-throughput 
screening to discover proteins with improved properties. 
; The computational component uses, atomic level scoring 
; functions, side chain rotamer sampling and advanced 
optimization methods to capture the relationships be- 
tween protein sequence, structure and function accurately. 
I Calculations begin with the 3D structure of the protein and 
; a strategy to optimize one or more properties of the pro- 
i tein. PDA^ technology then explores the sequence space 
comprising ail pertinent amino acidS (including unnatural 
; amino acids, if desired) at the positions targeted for design. 



located in flexible loops; therefore, another approach 
is to introduce mutations that decrease flexibility. 
Thrombolytics are a class of protein therapeutics for which 
proteolytic susceptibility is especially important because 
many clotting factors are both activated and inactivated 
by specific proteases. For example, an engineered variant 
of coagulation factor Villa with increased resistance to 
proteolytic inactivation was generated by mutating two 
arginines required for cleavage by thrombin, factor Xa and 
activated protein C [15]. 

Solubility 

Protein therapeutics are typically expressed, formulated 
and administered at high concentrations. Under such 
conditions many proteins form inclusion bodies during 
expression or aggregates after formulation. Improving the 
solubility of a protein can facilitate discovery efforts, 
whereas enabling soluble prokaryotic expression can re- 
duce production costs dramatically and increase yields. It 
is far more critical to ensure the solubility of a protein 
therapeutic once it is administered. Aggregation can cause 
decreased activity, decreased bioavailability and increased 
immunogenicity. Several strategies have been applied suc- 
cessfully to reduce protein aggregation and enable soluble 
expression. Replacement of unpaired cysteine residues 
can prevent the formation of unwanted intermolecular 
disulfide bonds, as described above. Post-translational 
and chemical modifications, which are discussed in a later 
section, can also help to prevent aggregation. Substituting 
exposed nonpolar residues with polar residues can enable 
soluble expression and improve the solubility of the puri- 
fied protein. This strategy was applied successfully to the 
Al domain of cholera toxin, a powerful adjuvant. Of the 
six variants produced, one retained full biological activity 



This is accomplished by sampling conformational states of \ 
allowed amino acids and scoring them using a parameter- [ 
ized and experimentally validated function that describes 
the physical and chemical forces governing protein struc- i 
ture. Powerful combinatorial search algorithms are then ; 
used to search through the initial sequence space, which 
can constitute 1 0^P sequences pr more, and quickly return a 
tractable number of sequences that are predicted to satisfy 
the design criteria. Useful modes of the technology span \ 
from combinatorial sequence design to prioritized selection ; 
of optimal single site substitutions. PDA^ technology has 
been applied to nuhnerous systems including Important I 
pharmaceutical and Industrial proteins and has a demon- 
strated record of success in protein optimization. 



and stability and also displayed a significant improve- 
ment in solubility [16]. Altering the net charge and iso- 
electric point (pi) of a protein can also affect its solubility. 
For example, a single chain antibody targeting renal cell 
carcinoma was altered to increase solubility by adding five 
glutamic acid residues to the C-terminus, thus lowering 
the pi from 7.5 to 6.1 [17]. Although there are a few ex- 
amples of rational solubility engineering, the majority of 
the published successes in solubility optimization have 
been anecdotal. Until now solubility obstacles have been 
more or less considered to be formulation problems that 
can be surmounted with an exhaustive protein chemistry 
effort. We anticipate that systematic structure-guided 
optimization efforts will lead to the emergence of well- 
defined strategies that consistently yield proteins with 
improved solubility and minimally perturbed structure 
and function. 

Pharmacokinetics 

The first generation of protein therapeutics frequently 
suffered from poor PK. As our understanding of protein 
clearance processes improves, it becomes possible to 
rationally modify proteins to tailor their PK profiles. 
Properly controlling the serum concentration of a thera- 
peutic protein over time can lead to improved efficacy 
and decreased side effects. In fact, improvements in PK 
properties can be so vital to the efficacy of a protein drug 
that they are often made at the expense of specific activ- 
ity. In addition to eliminating proteolytic susceptibility 
(see above), several strategies have been developed to 
alter PK, including polyethlylene glycol (PEG) attach- 
ment (PEGylation), glycosylation, fusion to proteins with 
long serum half-lives, alteration of oligomerization 
state and modulation of receptor-mediated uptake and 
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turnover. Knowledge of the dominant route or routes of 
elimination for a given protein can help significantly in 
determining which of these strategies will be the most ap- 
propriate. For low molecular weight protein therapeutics, 
kidney filtration dominates, encouraging modifications 
that increase the effective size. In the case of ligand-re- 
ceptor systems PK often depends on the relative influence 
of receptor-mediated clearance versus renal clearance. 
Affinity and specificity modifications are a central com- 
ponent of many therapeutic optimization strategies, and 
thus receptor-mediated clearance might play an important 
role in the efficacy of many proteins, even when it is not 
considered explicitly. 

Fusion proteins 

In a straightforward application of molecular size manipu- 
lation, proteins covalently fused to themselves often 
display significantly improved PK profiles [18]. The PK of a 
therapeutic protein can be increased more dramatically 
through fusion to a protein that is known to have a long 
serum half-life, typically albumin or the Fc region of anti- 
bodies. Amgen and Wyeth's Enbrel® (etanercept). which is 
currently marketed for the treatment of rheumatoid arthritis, 
is a fusion protein consisting of the extracellular domain 
of p75 tumor necrosis factor receptor (TNFR) and the Fc 
domain of human IgG. Fc increases the serum half-life of 
Enbrel®, presumably by both increasing its size and medi- 
ating endosomal recycling (see below). Furthermore, 
because Fc is a dimer. Enbrel®'s affinity for TNF-a is 50- to 
1000-fold higher than the affinity of monomeric TNFR 
[19]. Albumin fusions have been used to generate variants 
of the anticoagulant proteins hirudin [20] and barbourin 
[21]. An interesting twist on this approach is to tag proteins 
with a peptide sequence that specifically binds albumin. 
Addition of an albumin-binding peptide tag to the antitissue 
factor D3H44 Fab increases its half-life by approximately 
40-fold [22]. 

Alteration of oligomerization state 

The rate of absorption after injection can be affected by the 
molecular weight and solubility of a protein. An interesting 
example is provided by comparing wild type insulin, fast- 
acting insulin variants and sustained-release insulin 
variants. Native insulin forms a mixture of dimers and 
hexamers. The fast-acting insulin variants produced by 
Eli Lilly and Novo Nordisk. Humalog® (insulin lispro) and 
NovaLog® (insulin aspart) respectively, contain mutations 
that decrease oligomerization and, therefore, increase the 
rate of absorption. As a result, patients can administer 
these fast-acting insulin variants at mealtimes rather than 
1 hour before as was required with native insulin. Long-acting 



insulin variants are used to maintain steady basal insulin 
levels. For example, the Aventis product Lantus® (insulin 
glargine) was engineered by increasing the pl to promote 
precipitation upon subcutaneous injection, thus slowing 
the rate of absorption [23]. 

PEGylation 

PEG is a highly flexible and soluble polymer that has 
gained widespread scientific and regulatory acceptance as 
a chemical modification for therapeutic proteins. PEGylation 
improves PK predominantly by increasing the effective size 
of a protein, with most significant effects for proteins 
smaller than 70 kDa [24,25]. PEGylation can also reduce 
immunogenicity and aggregation [26]. Although a variety 
of chemistries exist [27,28] for coupling PEGs of various 
sizes to proteins, the greatest attachment specificity gener- 
ally arises from PEGylation at the N-terminus or unpaired 
cysteines. 

Several PEGylated protein therapeutics, such as Schering- 
Plough's PEG-Intron® (peginterferon alfa-2b) and Roche's 
PEGasys® (peginterferon alfa-2a), are currently on the 
market or in late-stage clinical trials. PEGasys® exhibits a 
50- to 70-fold increase in serum half-life and substantially 
reduced variability in serum concentration [29]. A com- 
mon negative effect of PEGylation. exemplified by both 
PEGylated IFNs [29,30], is a loss of specific activity. Future 
studies on these and other proteins should, therefore, focus 
on minimizing activity loss by optimizing the sites and 
sizes of PEG attachment rationally. 

Glycosylation 

Site-specific incorporation of glycosylation sites serves as 
an additional approach for improving PK. A notable ex- 
ample is Amgen's hyperglycosylated erythropoietin (Epo) 
variant Aranesp® (darbepoetin alfa) , engineered to contain 
two additional N-linked glycosylation sites. The additional 
glycosylation increases the serum half-life threefold while 
reducing in vitro binding roughly fourfold [31]. Thus, 
Aranesp® is another example of how modification can im- 
prove in vivo efficacy, despite reducing specific activity. 
Accordingly, future efforts could benefit from using rational 
methods to identify N-linked or 0-linked glycosylation 
sites that best maintain the structural and functional prop- 
erties of the protein. 

Endocytic trafTicking 

The PK of many proteins that bind cell-surface receptors 
can also be affected by endocytic trafficking. Cell-surface 
receptors and bound ligands are continually internalized 
by endocytosis. The receptors and ligands can be recycled 
back to the surface, degraded in lysozomes or transported 
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across cells (e.g. from the apical membrane to the basolat- 
era! membrane). The fate of the ligand is often determined 
by the extent of association with the receptor within the 
endosome, although the relationship between processing 
and association is highly system dependent. The pH in 
endosomal compartments is lower than that of serum. 
Because protein-protein interactions are typically pH- 
dependent, many ligands are freed from their receptors as 
they proceed through the endosomal pathway. In some 
protein families released ligand is recycled, whereas ligand 
that remains bound is targeted for degradation. For exam- 
ple, the more tightly epidermal growth factor (EGF) family 
ligands bind EGF receptor at pH 6, then the lower the frac- 
tion of ligand that is recycled [32]. Recently Lauffenberger 
and colleagues have taken advantage of the pH depen- 
dence of endosome-mediated G-CSF turnover to rationally 
engineer variants with improved PK. Several residues 
involved in receptor binding were mutated to histidine, 
which is neutrally charged in serum but has a net positive 
charge in the acidic environment of endocytic vesicles. The 
variants were predicted to bind receptor normally at the 
cell surface but to release more effectively than the wild 
type after endocytosis. Two mutations were shown to have 
increased half-life and potency compared with wild type 
G-CSF [33]. This elegant example illustrates the ability of 
rational engineering methods to use accumulated biologi- 
cal knowledge to generate improved therapeutics. Another 
important example, discussed below in more detail, is the 
pH-dependent recycling of immunoglobulin Fc domains. 
In this case the effect is opposite: the pH drop purges anti- 
gen from the variable region while enhancing Fc binding 
to its receptor, thus enabling the antibody and its receptor 
to be recycled to the serum. 

Affinity, specificity and conformational control 

Rational design can be used to modify the affinity and 
specificity of interactions between a therapeutic protein 
and other biomolecules. In some cases, increasing the 
binding affinity for a target protein can produce an in- 
crease in biological activity. In other cases, it is possible 
to reduce undesired biological activities by decreasing 
the affinity for nontarget molecules. An example of af- 
finity enhancement is the generation of superagonist 
variants of human thyrotropin (hTSH) by altering the 
net charge of the protein. The hTSH receptor has a net 
negative charge, and mutations that introduce positively 
charged residues or replace negatively charged residues 
in the peripheral loops of hTSH increase activity. The 
best variants show a 50,000-fold increase in receptor 
binding affinity and 1000-fold increase in in vivo activity 
[34.35]. 



The power of rational design is most impressive when it 
is used to generate novel mechanisms of action. For exam- 
ple, 4-helix bundle cytokines, including vascular endothe- 
lial growth factor (VEGF), hGH and interleukin-6 (IL-6), 
have been engineered to function as receptor antagonists 
rather than agonists. Most members of the 4-helix bundle 
cytokine family must form multiple protein-protein inter- 
actions at the cell surface to trigger signaling. VEGF, for ex- 
ample, forms homodimers that bind to two VEGF receptors, 
whereas IL-6 binds to a low-affinity IL-6 co-receptor and 
gpl30. Antagonistic VEGF variants were designed as hetero- 
dimers. which contain one functional binding site per dimer 
[36]. An IL-6 superantagonist was generated by selecting 
mutations that disrupt binding to gpl30 and incorporating 
mutations that result in increased affinity for the IL-6 co-re- 
ceptor [37]. An especially interesting example of a designed 
cytokine antagonist is Genentech/Pharmacia s Somavert® 
(pegvisomant) , a hGH variant that has recently successfully 
completed clinical trials for treatment of acromegaly. hGH 
contains two distinct receptor binding sites and dimerizes 
its receptor upon binding. Somavert® contains a point 
mutation at the second receptor binding site that blocks 
receptor dimerization [38] and eight additional mutations, 
identified by phage display, that increase the receptor- 
binding affinity of the first site [39]. 

Many proteins undergo conformational changes that 
are central to their function. In such cases, rational design 
methods can drive conformational equilibria towards the 
therapeutically desirable state. A notable example is the 
design of constitutively active and inactive integrin I domain 
variants. Integrin I domains can populate two dominant 
conformations: an 'open' conformation, which can bind 
intracellular adhesion molecule- 1 (ICAM-1), and a 'closed' 
conformation, which has very low affinity for ICAM-1 . The 
native protein rests in the closed conformation and con- 
verts to the open conformation during signaling. Springer 
and coworkers used two distinct strategies to generate con- 
formationally locked integrin I domain variants. One ap- 
proach introduced pairs of cysteines that form disulfide 
bonds compatible with either the closed or open confor- 
mation [40,41]. In the second approach, mutations were 
designed in the core of the domain that were computation- 
ally selected to stabilize the open conformation and disallow 
the closed state [42] . 

Immunogenicity 

The potential for protein therapeutics to produce harmful 
immune responses is a significant barrier to the development 
and acceptance of protein drugs. The immune response is 
typically most severe for nonhuman proteins. For exam- 
ple, antibodies against streptokinase, a bacterially derived 
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Table 3. Engineered antibodies on the market" 



Name 


Company 


Target 


Indication 


Type 


Orthoclone 0KT3* (muromonab-CD3) 


Ortho Biotech/ 
Johnson & Johnson 


CD3 


Transplant rejection 


Murine 


ReoPro* (abciximab) 


Centocor/Lilly 


GPllb/llla 


Restenosis 


Chimeric 


Rituxan* (rttuximab) 


IDtC/Genentech 


CD20 


B-cell non-Hodgkins lymphoma 


Chimeric 


Simulect* (basiliximab) 


Novartis 


IL-2R 


Transplant rejection 


L/nimeric 


Remicade* (infliximab) 


Centocor 


TNF-a 


Crohn's disease, rheumatoid arthritis 


Chimeric 


Zevalin* (ibritumomab tiuxetan) 


IDEC/Schering AG 


CD20 


B-cell non-Hodgkins lymphoma 


Chimeric 


Zenapax* (daclizumab) 


PDL/Roche 


IL-2R 


Transplant rejection 


Humanized 


Synagis* (palivizumab) 


Medlmmune 


RSV F protein Respiratory syncitial virus 


Humanized 


Herceptin* (trastuzumab) 


Genentech 


HER2/neu 


Breast cancer 


Humanized 


Mylotarg* (gemtuzumab ozogamicin) 


Celltech/Wyeth 


CD33 


Acute myeloid leukemia 


Humanized 


Campath* (alemtuzumab) 


Millenium/ILEX 


CD52 


B-cell chronic lymphocytic leukemia 


Humanized 



Ortho Biotech (http://www.orthobiotech.com); Johnson & Johnson (http://www.jnj.com); Centocor (http: //www. centocor, com); Eli Ully (http://www.lilly.com); 
IDEC (http://www.idec.com); Genentech (http://www.genentech.com); Novartis (http://www.novartis.com); Schering Ag (http://www.schering.de/eng/); 
PDL (http;//www.pdl.com); Roche (http://www.roche.com); Medlmmune (http://wnww.medimmune.com); Celltech (http://wwAW.celltechgroup.com); 
Wyeth (http://www.wyeth.com); Millenium (http://www.mlnm.com); ILEX (http://www.ilexonc.com). 

'Abbreviations: GPllb/llla. platelet glycoprotein llb/llla; 1L-2R , Interleukin 2 receptor; RSV F, respiratory syncitial virus F; TNF-a, tumor necrosis factor a. 



antithrombolytic used to treat myocardial infarction, not 
only neutralize the protein and reduce its efficacy, but also 
can elicit severe allergic reactions that effectively limit 
streptokinase therapy to one-time use. Yet even therapeu- 
tics based on human proteins can cause immune responses 
depending on the mode of administration (including 
dosage, frequency and route) and the solubility and stabil- 
ity of the formulated protein. Neutralizing antibodies have 
been observed against a variety of human proteins includ- 
ing insulin, factor VIII, IFNs. Epo and megakaryocyte 
growth and differentiation factor (MGDF). In some cases, 
for example with the multiple sclerosis drug IFN-p, efficacy 
is severely hindered due to neutralizing antibodies [43]. 
Devastating problems can result when elicited antibodies 
crossreact with endogenous protein. For example, clinical 
trials of MGDF were halted when crossreactive neutralizing 
antibodies to endogenous thrombopoietin caused reduced 
platelet counts (thrombocytopenia) in a small number 
of otherwise healthy volunteers. As another example. 
Johnson and Johnson's European formulation of Epo, 
Eprex® (epoetin alfa) has caused pure red blood cell aplasia 
in several patients owing to the formation of crossreactive 
neutralizing antibodies. 

The application of rational engineering to immuno- 
genicity has been aimed mostly at increasing the anti- 
genicity of proteins for use in vaccines. Immune reduction 
of proteins as a whole is not as straightforward, and relatively 
few examples exist of rationally reducing or eliminating 



the immunogenicity of protein therapeutics. The only real 
success for immunogenicity reduction has been the hu- 
manization of murine antibodies, made possible by the 
high regularity of antibody sequence and structure and the 
ability to use proximity to human sequence as a metric for 
immunogenicity. In some cases, PEGylation has reduced 
the fraction of patients who raise neutralizing antibodies, 
possibly by sterically blocking access to epitopes [44]. 
Rational design methods that improve the solution properties 
of a protein therapeutic might also reduce immunogenicity 
because aggregates are generally more immunogenic than 
soluble proteins. 

A more general approach to de-immunization involves 
mutagenesis of epitopes in the protein sequence and struc- 
ture that are most responsible for stimulating the immune 
system. Some success has been achieved by randomly re- 
placing surface residues, thus generating sequences with 
lower affinity for panels of known neutralizing antibodies 
[45,46]. An alternate approach is to disrupt T-cell activation 
by mutating peptides that bind class II major histocompa- 
tibility complex (MHC) alleles. Removal of MHC-binding 
epitopes offers a much more tractable approach to de- 
immunization than the removal of antibody epitopes because 
the diversity of MHC molecules comprises only 1-2x103 
alleles, whereas the antibody repertoire is estimated to be 
approximately lO^, A current challenge for rational design 
methods is to identify sequence variants that eliminate po- 
tential MHC-binding epitopes while maintaining protein 
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structure and function. We anticipate that a general solution 
to the problem of protein immunogenicity will improve 
the safety and efficacy of protein therapeutics substantially 
and will enable new classes of nonhuman and de novo 
proteins to enter the clinic. 

Antibodies 

Some of the most visible and successful applications of ra- 
tional engineering methods to biotherapeutics have oc- 
curred in the field of antibodies. Monoclonal antibodies 
are widely used as treatments for a variety of conditions 
from arthritis to cancer. There are currently 1 1 antibody 
products on the market, as shown in Table 3. and well 
over 100 in development. Despite such widespread accep- 
tance and promise, there is still a need for structural and 
functional antibody optimization. Current antibody engi- 
neering efforts target both the variable and Fc regions of 
the molecule. 

Antibody variable domains suffer from stability and solu- 
bility issues similar to all proteins, as discussed previously. 
However, because antibodies share a common structural 
scaffold, rational engineering studies have been able to 
dissect some of the sequence and structural determinants 
of variable region solubility and stability 
[47]. Notable developments include the 
structure-based design of more finely 
tuned complementarity determining 
region grafts and libraries [48,49], the 
use of phage-based selection methods for 
humanization [50,51] or fully human 
antibody generation [52] and the appli- 
cation of computational methods to in- 
crease the association rate of antibody/ 
antigen formation at predicted *ON-Rate 
AMPiification Sites* (Marvin and Lowman, 
pers. commun.). Furthermore, owing to 
the modular nature of immunoglobulin 
domains, variable domain architectures, 
such as diabodies, triabodies and bispe- 
cific diabodies, are being engineered to 
better serve specific therapeutic applica- 
tions [53,54]. For more detail on variable 
region engineering the reader is referred 
to an excellent review by Maynard and 
Georgiou [53]. 

The Fc region of an antibody mediates 
interactions with several receptors, thus 
allowing antibodies to recruit the im- 
mune system and possess an extended 
serum half-life [55,56]. Significant effort 
has gone into engineering Fc for enhanced 



functional properties. Most exciting are recent results in- 
dicating that tighter binding by Fc to certain Fc gamma 
receptors, obtained by mutagenesis [57] or expression of 
carbohydrate isoforms [58,59], can result in enhanced 
effector function, potentially enabling the engineering of 
more potent antitumor antibodies. Additionally, some suc- 
cess has been achieved in modulating antibody PK by gen- 
erating Fc variants with altered affinity for the neonatal 
receptor FcRn [60.61]. The bottleneck for Fc engineering is 
production. Because of the requirement for glycosylation. 
Fc and full-length antibodies must be produced in mam- 
malian systems, precluding screening of large numbers of 
variants. Engineering a system with such high therapeutic 
potential yet limited screening capacity will be an exciting 
challenge for rational protein design. 

State of the art rational engineering 

The numerous examples discussed in this review illustrate 
both the demand for and power of rational engineering 
methods to improve the efficacy of biotherapeutics. There 
is currently an opportunity to replace the typical hit-and- 
miss approach to protein optimization with quantitative 
and systematic engineering strategies using computational 
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Figure 2. Multiparameter optimization of proteins. The modification of linear 
sequence epitopes is useful for modulating immunogenicity, altering proteolytic 
stability, introducing chemical modification sites and other strategies. However, 
these changes have a high likelihood of disrupting the integrity of the structure 
and function of the protein. It can be difficult to experimentally select for 
multiple properties simultaneously, such as nonimmunogenic and active variants. 
Rational design approaches are uniquely suited to the identification of protein 
sequences that satisfy multiple constraints. Abbreviation: MHC, major 
histocompatibility complex. 
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protein design methods [62-65]. Xencor's PDA^ technology 
is an example of these new methods (Box 1). 

The full potential of computational design algorithms is 
realized when they are followed by high-throughput 
experimental screening efforts to single out superior mem- 
bers of a protein library. Computationally generated li- 
braries are significantly enriched in stable, properly folded 
sequences relative to randomly generated libraries. In 
effect, structure-based sequence sampling methods yield 
an increased hit-rate, thereby decreasing the number of 
variants that must be screened. This feature is often critical 
to success because screens for therapeutic proteins, such 
as cell-based or in vivo assays, are often extremely low 
throughput. Given a high quality library, experimental 
screening methods can identify the sequence or sequences 
with the best characteristics quickly. 

Computational design algorithms have tremendous po- 
tential for addressing conflicting constraints on a protein's 
sequence and structure, a common challenge in protein 
optimization efforts. As illustrated in Figure 2, many strat- 
egies (e.g. introduction of chemical or post-translational 
modification sites, removal of proteolysis sites and removal 
of MHC epitopes) require modifications to local primary 
structure. In most cases, however, the effect of these 
changes on the tertiary structure and functional integrity 
of the protein must also be considered. In other cases, one 
seeks primary structure alterations that disrupt one inter- 
action while preserving a multiplicity of other interactions. 
Unfortunately, the number of acceptable sequence solu- 
tions narrows dramatically as the number of constraints is 
increased. One costly solution to this general problem is to 
develop assays that assess compatibility with each con- 
straint separately. Alternatively, computational algorithms 
can simultaneously consider most or all of the constraints 
in the context of the whole protein. This approach also 
affords the opportunity to discover compensatory mutations 
elsewhere in the protein to accommodate changes made at 
the primary optimization site. 

Conclusions 

To convert a typical endogenous protein to a successful 
therapeutic it is often necessary to optimize several para- 
meters, such as stability, solubility. PK and immunogenic- 
ity, while preserving or even enhancing function. Many 
strategies have already emerged for perturbing these para- 
meters. We anticipate that the continued development and 
application of rational protein design technology will 
enable significant improvements in the efficacy and safety 
of existing protein therapeutics, as well as allow the gener- 
ation of entirely novel classes of proteins and modes of 
action. 
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Not long ago, it seemed inconceivable that 
proteins could be designed from scratch. Be- 
cause each protein sequence has an astro- 
nomical number of potential conformations, 
it appeared that only an experimentalist with 
the evolutionary life span of Mother Nature 
could design a sequence capable of folding 
into a single, well-defined three-dimensional 
structure. But now, on page 82 of this issue, 
Dahiyat and Mayo ( I ) describe 
a new approach that makes de 
novo protein design as easy as 
running a computer program. 
Well almost. . . 

The intellectual roots of this 
new work go back to the early 
1980s when protein engineers 
first thought about designing 
proteins (2). At that point, the 
prediction of a protein's three- 
dimensional structure from its 
sequence alone seemed a diffi- 
cult proposition. However, they 
opined that the inverse prob-" 
lem — designing an amino acid 
sequence capable of assuming a 
desired three-dimensional struc- 
ture — would be a more tractable 
problem, because one could 
"over-engineer" the system to fa- 
vor the desired folding pattern. 
Thus, the problem of de novo protein design 
reduced to two steps: selecting a desired ter- 
tiary structure and finding a sequence that 
would stabilize this fold. Dahiyat and Mayo 
have now mastered the second step with spec- 
tacular success. They have distilled the rules, 
irwights, and paradigms gleaned from two de- 
cades of experiments (3 ) into a single compu- 
tational algorithm that predicts an optimal 
sequence for a given fold. Further, when put to 
the test the algorithm actually predicted a 
sequence that folded into the desired three- 
dimensional structure. Thus, the rules of pro- 
tein folding and computational methods for 
de novo design may now be sufficiently de- 
fined to allow the engineering of a variety of 
proteins. 

Dahiyat and Mayo's program divides the 
interactions that stabilize protein structures 
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into three categories: interactions of side 
chains that are exposed to solvent, of side 
chains buried in the protein interior, and of 
parts of the protein that occupy more interfa- 
cial positions. Exposed residues contribute to 
stability, primarily through conformational 
preferences and weakly attractive, solvent- 
exposed polar interactions (4). The burial of 
hydrophobic residues in the well-packed in- 




Better than the real thing. The natural zinc finger protein Zif268 (left) is 
stabilized in part by a core of hydrophobic (green) side chains and metal- 
chelating side chains (red). In the designed protein FSD-1 (right), the 
Zif268 core is retained but the metai-chelating His residues and one of the 
Cys. residues of Zif268 are converted to hydrophobic Phe and Ala resi- 
dues, thereby extending the hydrophobic core. The fourth metal ligand 
Cys^ is converted to a Lys residue. The apolar portion of this interfacial 
residue shields the hydrophobic core, whereas its ammonium group is ex- 
posed to solvent. The helix is also stabilized by an N-capping Interaction 
( 19), which presumably also stabilizes the structure. 



terior of a protein provides an even more 
powerful driving force for folding. The side 
chains in the interior of a protein adopt 
unique conformations, the prediction of 
which is a large combinatorial problem. 

One important simplifying assumption 
arose from the early work of Jainin et al. (5), 
who showed that each individual side chain 
can adopt a limited number of low-energy 
conformations (named rotamers), reducing 
the number of probable conformers available 
to a protein. This work was subsequently ex- 
tended to the design of proteiris containing 
only the most favorable rotamers (6). Al- 
though the side chairis in natural proteins 
deviate from ideality in a few cases (compli- 
cating the prediction of the structures of 
natural proteins), these deviations need not 
be considered in the design of idealized pro- 
teins. Thus, various algorithms have been 
developed to examine all possible hydropho- 
bic residues in all possible rotameric states, to 
find combinations that efficiently fill the in- 
terior of a protein. A complementary ap-. 



proach uses genetic methods to exhaustively 
search for sequences capable of filling a pro- 
tein core (7), and this work has been adapted 
for the de novo design of proteins (8). 

Interfacial residues are also quite im- 
portant for protein stability (9, 10), They 
are often amphiphilic (for example, Lys, 
Arg, and Tyr) and their apolar atoms can 
cap the hydrophobic core, while their po- 
lar groups engage in electrostatic and hy- 
drogen-bonded interactions. 

Until recently, protein designers have fre- 
quently concentrated on quantifying the en- 
ergetics associated with just one of these three 
types of interactions (3). However, de novo 
design is best approached by simultaneously 
considering all of the side chains in the pro- 
tein — unfortunately, a yery high-order com- 
binatorial problem. For instance, the volume 

available to the interior side 

chains depends on the nature and 
corrformation of the residues at 
the Lnterfecial positioT\s and vice 
versa. Dahiyat and Mayo assumed 
that each of these three features 
had been adequately quantitated. 
to provide a useful empirical en- 
ergy function for protein design. 
Their program combines a num- 
ber of feaures taken from earlier 
potential functions and includes 
a penalty for exposing hydropho- 
bic groups to solvent. Another es- 
sential innovation included in 
their program is an implementa- 
tion of the Dead-End Elimina- 
tion theorem, ' to efficiently 
search through sequence and side 
chain rotamer space. 

Dahiyat and Mayo's target 
fold is a zinc finger, a motif with 
a well-established history in protein struc- 
ture prediction and design. In an early, pre- 
scient paper, Berg correctly inferred that this 
His^Cys^ Zn-binding motif must feature a p- 
P-a fold that would position the ligating 
groups in a tetrahedral array around the 
bound Zn(II) (IJ). Favorable metal ion- 
ligand interactions together with a small 
apolar core help stabilize the three-dimen- 
sional structure of this compact fold. More . 
recently, Imperiali and co-workers have de- 
signed a peptide that folded into this motif, 
even in the absence of metal ions {12). The 
design included a D-amino acid to stabilize a 
type ir turn, and a large, rigid tricyclic side 
chain that may help consolidate the hydro- 
phobic core. This work was particularly ex- 
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citing because, before their studies, it was not 
expected that sequences as short as 25 resi- 
dues in length could fold into stable tertiary 
structures. 

. Now, Dahiyat and Mayo take these studies 
one step further through the design of a se- 
quence composed of only natural amino acids 
that adopts the zinc finger motif. As input to 
their program, they introduced the coordi- 
nates of the backbone atoms from the crystal 
structure of the second domain of the zinc 
finger protein Zi£Z68. The program then 
evaluated a total of 10^^ possible side chain- 
rotamer combinations to find a sequence ca- 
pable of stabilizing this fold without a bound 
metal ion. The resulting protein sequence 
shares a small hydrophobic core with its pre- 
decessor from Zif268. However, in the newly 
designed protein FSD-1 the core is enlarged 
through the addition of hydrophobic resi- 
dues that fill the space vacated by the re- 
moval of the metal-binding site (see the fig- 
ure). This increase in the size 'of the hydro- 
phobic core together with the enhancements 
in the propensity for forming the appropriate 
secondary structure provide an adequate 
driving force for folding. The designed 
miniprotein actually folds into the desired 
structure as assessed by nuclear magnetic 
resonance spectroscopy, and the observed 
structure closely resembles the three-dimen- 
sional structure of Zif268. 

Because of its small size, the protein is 
marginally stable. A Van*t Hoff analysis of 
the thermal unfolding curve gives a change 
in the enthalpy (AHvh) of approximately 
-10 kcal/mol, and indicates that the protein 
is about 90 to 95% folded at low tempera- 
tures (13). The small value AHvh and the 
lack of strong cooperativity in the unfolding 
transition are expected for a native-like pro- 
tein of this very small size ( J4). Thus, FSD-1 
is the smallest protein known to be capable 
of folding into a unique structure without the 
thermodynamic assistance of disulfides, 
metal ions, or other subunits. This important 
accomplishment illustrates the impressive 
ability of Dahiyat and Mayors program to 
design highly optimized sequences. 

This new achievement caps a banner year 
for de novo protein design- Earlier, Regan (15) 
answered the challenge of changing a protein's 
tertiary structure by altering no more than 50% 
of its sequence. And although Dahiyat and 
Mayo have demonstrated that the stabilizing 
metal-binding site is not necessary in their sys- 
tem, Caradonna, Hellinga, and co-workers 
(16) have made impressive progress in auto- 
mating the introduction of frmctional metal- 
binding sices into the three-dimensional struc- 
tures of riatural proteins. Further, other workers 
{17) have used less automated approaches to 
successfully introduce functionally and spec- 
troscopically interesting metal-binding sites 
into de novo designed proteins. 



To date, the most computatioruilly inten- 
sive protein design problems have been the 
redesign of natural proteins of known three- 
dimensional structure. But the new automated 
approaches open the door to the de novo design 
of structures with entirely novel backbone con- 
formations. It will be interesting to see if 
Dahiyat and Mayo*s approach of designing an 
optimal sequence for a given fold is sufficient, 
or if it will be necessary also to destabilize alter- 
nate possible folds. Indeed, when using an ear- 
lier version of their algorithm to repack the 
interior of the coiled coil from GCN4, they had 
to retain the identity of a buried Asn residue 
from the wild-type protein. Although the in^ 
elusion of this Asn actually destabilized the 
desired fold, it was nevertheless essential to 
avoid the formation of alternate, unwanted 
corrformers (18). The ability to ask such fo- 
cused questions will reveal much about how 
natural proteins adopt their folded conforma- 
tions while simultaneously allowing the design 
of entirely new polymers for applications rang- 
mg horn catalysis to pharmaceuticals. 
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NUCLEIC ACIDS AND PROTEIN VARIANTS 
OF HG-CSF WITH GRANULOPOIETIC 
ACTIVITY 

This application is a continuing application of U.S. Ser. 5 
Nos. 60/115,131, filed Jan. 6, 1999 and of 60/118,831, filed 
Feb. 5, 1999. 

HELD OF THE INVENTION 

The invention relates to novel granulopoietic activity 
(CPA) proteins and nucleic acids. The invention further 
relates to the use of the CPA proteins in the treatment of 
G-CSF related disorders. 

BACKGROUND OF THE INVENTION 

The colony stimulating factors are a class of protein 
hormones that stimulate the proliferation and the function of 
specific blood cell types such as granulocytes. Granulocytes 
engulf and devour microbial invaders and cell debris and 20 
thus are crucial to infection response. Granulocytes have 
only a 6-12 hour life span in the bloodstream and are 
destroyed as they function. Accordingly, it necessary for the 
blood marrow stem cells to rapidly and constantly generate 
granulocytes. Granulocyte colony stimulating factor 25 
(G-CSF) is a protein that is essential for the proliferation and 
differentiation of granulocytes, particularly neutrophils. 

However, as a result of their fast turnover, the granulocyte 
count falls rapidly and markedly upon bone marrow 
damage, for example from treatment with traditional cancer 30 
treatments, including chemotherapeutic agents and 
radiation, or immunologic disorders including AIDS. 
Accordingly, treatment with hG-CSF has been shown to be 
efiBcacious in minimizing some of the side effects of cancer 
therapies, as well as in treatment of suppressed immune 35 
systems. 

However, wild-lype hG-CSF has several disadvantages, 
including storage stability problems as well as a short 
half-life in the blood stream. 

To this end, variants of G-CSF are known; see for 
example U.S. Pat. Nos. 5,214,132; 5,399^45; 5,790,421; 
5,581,476; 4,999,291; 4,810,643; 4,833,127; 5,218,092; 
5362,853; 5,830,705; 5,580,755; 5,399,345 and 5,416,195 
and references cited therein. 

45 

However, a need still exists for proteins exhibiting both 
significant stability and granulopoietic activity. Accordingly, 
it is an object of the invention to provide granulopoietic 
activity (GPA) proteins, nucleic acids and antibodies for the 
treatment of neutrophil disorders. 

SUMMARY OF THE INVENTION 

In accordance with the objects outlined above, the present 
invention provides non-naturally occurring GPA proteins 
(e.g. the proteins are not found in nature) comprising amino 55 
acid sequences that are less than about 95-97% identical to 
hG-CSF. The GPA proteins have at least one biological 
property of a G-CSF protein; for example, the GPA proteins 
will stimulate cells with a G-CSF receptor to proliferate. 
Thus the invention provides GPA proteins with amino acid go 
sequences that have at least about 5 amino acid substitutions 
as compared to the hG-CSF sequence shown in FIG. 1. 

In a further aspect, the present invention provides non- 
naturally occurring GPA conformers that have three dimen- 
sional backbone structures that substantially correspond to 65 
the three dimensional backbone structure of hG-CSF. 'llie 
amino acid sequence of the con former and the amino acid 
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sequence of the hG-CSF are less than about 95% identical. 
In one aspect, at least about 90% of the non-identical amino 
acids are in a core region of the conformer. In other aspects, 
the conformer have at least about 100% of the non-identical 
amino acids are in a core region of the conformer. 

In an additional aspect, the changes are selected from the 
amino acid residues at positions selected from 14, 17, 20, 21, 
24, 27, 28, 31, 32, 34, 38, 78, 79, 85, 89, 91, 99, 102, 103, 
107, 109, 110, 113, 116, 120, 145, 146, 147, 148, 151, 153, 
155, 156, 157, 160, 161, 164, 168 and 170. Preferred 
embodiments include at least about 5 or 10 variations. 

In a further aspect, the invention provides recombinant 
nucleic acids encoding the non-naturally occurring GPA 
proteins, expression vectors comprising the recombinant 
nucleic acids, and host cells comprising the recombinant 
nucleic acids and expression vectors. 

In an additional aspect, the invention provides methods of 
producing the GPA proteins of the invention comprising 
culturing host cells comprising the recombinant nucleic 
acids under conditions suitable for expression of the nucleic 
acids. The proteins may optionally be recovered. 

In a further aspect, the invention provides pharmaceutical 
compositions comprising a GPAprotein of the invention and 
a pharmaceutical carrier. 

In an additional aspect, the invention provides methods 
for treating a G-CSF responsive condition comprising 
administering a GPA protein of the invention to a patient. 
The C-CSF condition may be myelo-suppresive therapy, 
chronic neutropenia, or peripheral blood progenitor cell 
collection. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 depicts the nucleic acid (SEQ ID N0:1) and amino 
acid (SEQ ID NO: 2) sequences of human G-CSF. 

FIG. 2 depicts the variable residues in each GPA set. 

FIG. 3 (SEQ ID NOS:2-15) depicts some preferred GPA 
sequences. The top Une (SEQ ID NO: 15) is the hG-CSF 
sequence . Any residue for which a change is not noted 
remains the same as the hG-CSF sequence. The second line 
(SEQ ID N0:3) is a GPA protein, bndry4_2, with variable 
boundary residues; 24 different positions were allowed to 
vary. The third line (SEQ ID N0:4) is a GPA protein, 
bndry4_core4, with boundary variable residues; this uti- 
lized 24 different boundary positions but used the optimal 
sequence from the core4 design as the starting template. The 
fourth line (SEQ ID N0:5) is a GPA protein, bndry4_AD, 
with boundary variable residues; however, the boundary 
residues were chosen on the outer two helices (A and D; 14 
variable residue positions) since initial calculations sug- 
gested that the most pronounced changes in helical propen- 
sity result from modifications at these locations; improve- 
ments in hehcal propensity might lead to improved stability. 
The fifth line (SEQ ID NO; 6) is a GPA protein, bndry4_ 
AD_core4 with 14 variable boundary residues; again suing 
the optimal sequence from the core4 design as the starting 
template. The sixth line (SEQ ID N0:7) is a GPA protein, 
core4, that utiHzed 26 different variable core positions. The 
seventh line (SEQ ID N0:8) is a GPA protein, core4_ 
VI 67 A, that utilized 25 variable core positions. The eighth 
line (SEQ ID N0:9) is a GPAprotein, core3, that had 34 core 
variable positions. 

FIG. 4 depicts the Monte Carlo analysis of the core4 GPA 
sequence. At the left is shown the hG-CSF sequence position 
numbers are shown in the second column, the ground state 
sequence is shown in the third column and the number of 
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occurrences of all amino acids found in the top 1000 Monte copy. CD directly measures secondary structure content of a 
Carlo sequences is shown in the last columns. At position 17, protein and can track the loss of structure in response to 
for example, the hG-CSF a mino acid is cysteine; in CPA temperature or chemical denaturants. FIG. 13 shows the 
proteins, 73.6% of the top 1000 sequences had leucine at this increased thermal stability of core4 relative to met hG-CSF. 
position, and 22.9% of the sequences had isoleucine. 5 piG. 14 depicts the cell proUferation response to met 
FIG. 5 depicts the Monte Carlo analysis of the core4v hG-CSF and 3 novel CPA proteins. Cell proliferation of 
GPA sequence. At the left is shown the hG-CSF sequence; BaF/3 cells expressing hG-CSF receptor is shown as moni- 
position numbers are shown in the second column, the tored by BrdU incorporation, plotted against protein con- 
ground state sequence is shown in the third column and the centration. BrdU incorporation is assessed by fluorescent 
number of occurrences of all amino acids found in the top lo eLISA. The figure shows the increased biological activity of 
1000 Monte Carlo sequences is shown in the last columns. core4 relative to met hG-CSF 

At position 17 for example the hG-CSF amino acid is pjc. 15 depicts the kinetics of storage stability of met 

cysteme;^ m om proteins, &y.7 /o ot the top ItWO sequences hG-CSF and core4 monitored b v size exclusion chromatoe- 

had leucine at tlus position, and 5.1% of the sequences had ^aphy HPLC. Tlte two proteins were incubated in 5% 

valme; and 25.1% of the sequences had isoleucine. 15 ^^^yj^,^ ^^^^ ^^^j^,^^ ^ ^^^^ TWeen-80 at pH 4.0 

FIG. 6 depicts the Monte Carlo analysis of the core3 GPA and and stored at 50° C. The protein concentration was 300 

sequence. At the left is shown the hG-CSF sequence; posi- ug/ml. Monomeric protein was considered intact 

tion numbers are shown i" the second column the ground ^ ^^i,; ,emperature (Tm) and extinc- 

slate sequence .s shown m the third column and the number coefficients of hG-CSF and some of the novel GPA 

of occurrences of all ammo acids found ,n the top 1000 ^0 ^^^^ .^^^^^.^^^ 
Monte Carlo sequences is shown m the last columns. 

HG. 7 depicts the Monte Carlo analysis of the bndry4_2 DETAILED DESCRIPTION OF THE 

GPA sequence. At the left is shown the hG-CSF sequence; INVENTION 

position numbers are shown in the second column, the ti, . • • j- . ^ . i 

^ -.ui-j. ^'""""'^ 25 present mvention is directed to novel proteins and 

ground state sequence is shown in the third column and the i -j • i • ■ . . ^ , . 

u c r 11 • 1 r I T : nucleic acids possessing granulopoietic ac ivity (sometimes 

number of occurrences of all ammo acids found m the op ^^.f^rr^n k™,« «^da ^Z*^- a «V-da i • 

mm . - , • .u i * i reterred to herein as * GPA pro terns and "GPA nucleic 

1000 Monte Carlo sequences is shown in the last columns. ^^.vic"\ tt,- «^ T * ^ • 

fi ^ • t ]I>r . 1 ^ ' c.v. X. A A ^^^^ )* proteins are generated using a system previ- 

RG. 8 depicts the Monte Carlo analysis of the bndry4_ ^^jy described in WO98/47089 and U.S. Ser. No. 09/127, 

core4 GPA sequence. At the left is shown the hG-CSF 926, both of which are expressly incorporated by reference 

sequence; position numbers are shown m the second their entirety, that is a computational modeling system 

CO umn, the ground state sequence is shown in the third that allows the generation of extremely stable proteins 

column and the number of occurrences of all ammo acids ^-^^^^^^ necessarily disturbing the biological functions of the 

found m the top 1000 Monte Carlo sequences is shown in the protein itself. In this way, novel GPA proteins and nucleic 

last columns. ^^^-^ generated, that can have a plurality of mutations in 

HG. 9 depicts the Monte Carlo analysis of the bndry4_ comparison to the wild-type enzyme yet retain significant 

AD GPA sequence. At the left is shown the hG-CSF activity. 

sequence; position numbers are shown in the second computational method used to generate and evaluate 

CO umn, the ground state sequence is shown m the third the GPA proteins of the invention is briefly described as 

column and the number ^ ^U^^^ ^ f^^^^ embodiment, the computational 

found in the lop 1000 Monte Carlo sequences IS shown m the ^,thod used to generate the primary library ^ Protein 

last columns. X^^^x^ Automaton (PDA), as is described in U.S. Ser. Nos. 

HG. 10 depicts the Monte Carlo analysis of the bndry4_ 60/061,097, 60/043,464, 60/054,678, 09/127,926 and PCT 

AD_core4 GPA sequence. At the left is shown the hG-CSF US98/07254, all of which are expressly incorporated herein 

sequence; position numbers are shown in the second 45 by reference. Briefly, PDA can be described as follows. A 

column, the ground state sequence is shown in the third known protein structure is used as the starting point. ITie 

column and the number of occurrences of all amino acids residues to be optimized are then identified, which may be 

found m the top 1000 Monte Carlo sequences is shown in the the entire sequence or subset(s) thereof. The side chains of 

last columns. any positions to be varied are then removed. The resulting 

FIGS. IIA, IIB and IIC depict the gene sequences for 50 structure consisting of the protein backbone and the remain- 
three GPA proteins: FIG. 11A(SEQ ID N0:16) is the core3 ing sidechains is called the template. Each variable residue 
GPA protein, FIG. IIB (SEQ ID NO: 17) is the core4 GPA position is then preferably classified as a core residue, a 
protein, and FIG. IIC (SEQ ID N0:18) is the core4v GPA surface residue, or a boundary residue; each classification 
P^o^^*"- defines a subset of possible amino acid residues for the 

FIG. 12 depicts the synthesis of a full-length gene and all 55 position (for example, core residues generally will be 
possible mutations by PCR. Overlapping oligonucleotides selected from the set of hydrophobic residues, surface resi- 
corresponding to the full-length gene (black bar, Step 1) are dues generally will be selected from the hydrophilic 
synthesized, heated and annealed. Addition of Pfli DNA residues, and boundary residues may be either). Each amino 
polymerase to the annealed oligonucleotides results in the 5' acid can be represented by a discrete set of all allowed 
to 3' synthesis of DNA (Step 2) to produce longer DNA eo conformers of each side chain, called rotamers. Thus, to 
fragments (Step 3). Repeated cycles of heating, annealing arrive at an optimal sequence for a backbone, all possible 
(Step 4) results in the production of longer DNA, including sequences of rotamers must be screened, where each back- 
some full-length molecules. These can be selected by a bone position can be occupied either by each amino acid in 
second round of PCR using primers (arrowed) correspond- all possible rotameric states, or a subset of amino acids, and 
ing to the end of the full-length gene (Step 5). 55 thus a subset of rotamers. 

FIG. 13 depicts the thermal stability of met hG-CSF and Two sets of interactions are then calculated for each 

several GPA proteins by circular dichroism (CD) speclros- rotamer at every position: the interaction of the rotamer side 



us 6,627,186 Bl 

5 6 

chain with all or part of the backbone (the "singles" energy, hG-CSF protein at an equivalent position is called a "vari- 
also called the rotamer/template or rotamer/backbone able residue". As is known in the art, the residues, or amino 
energy), and the interaction of the rotamer side chain with all acids, of proteins are generally sequentiaUy numbered start- 
other possible rotamere at every other position or a subset of tag with the N-terminus of the protein. Thus a protein having 
n?»lw«?°''''°"'^ V-^ energy, also called the 5 a methionine at it's N-terminus is said to have a methionine 

c nnt k c 1^, ^h'^^- ^,T"^^ 1 "'" I '"f ' '^^''i"* °f "-"i"" »«d position 1, with the next residues as 

actions IS calculated through the use of a vanely of scon ng -> ^ a a* ««e;t™ tuJ f /• . n 

functions, which include, but are not limited to, the energy ^' ' f ^ ''"^^ ^T ^""' type (i.e naturally 

of van der WaaFs forces, the energy of hydrogen bonding, protem may have one of at least 20 ammo acids 

the energy of secondary structure propensity, the energy of °^ ^^^^^^^ P°^'^^°° 

surface area solvation and the electrostatics. Thus, the total ^^'^'^ ^^'^ P°^^'^°" °f P^°^^»" 

energy of each rotamer interaction, both with the backbone resigned that is not fixed in the design method as a specific 

and other rotamers, is calculated, and stored in a matrix ^^^^^^^ rotamer. generally the wild-type hG-CSF residue 

form. rotamer. 

The discrete nature of rotamer sets allows a simple I" a preferred embodiment, all of the residue positions of 

calculation of the number of rotamer sequences to be tested. the protein are variable. That is, every amino acid side chain 

A backbone of length n with m possible rotamers per may be altered in the methods of the present invention, 

position will have m" possible rotamer sequences, a number in an alternate preferred embodiment, only some of the 

which grows exponentiaUy with sequence length and ren- residue positions of the protein are variable, and the remain- 

ders the calculations either unwieldy or impossible m real ^er are "fixed", that is, they are identified in the three 

l\nhiemT«n"^^'p H ^ r'^'mpi^T t T'"^ dimensional Structure as being a particular a mino acid in a 

problem, a "Dead End Elimination (DEE) calculation is ^„„f«™of:«„ i^ w J- . c j - 

performed. The DEE calculation is based on the fact that if ^^^^^^^^^t^^"- " some embodiments a fixed position is 

the worst total interaction ofa first rotamer is still better than origmal conformation (which may or may not 

the best total interaction of a second rotamer, then the second correlate to a specific rotamer of the rotamer library being 

rotamer cannot be part of the global optimum solution. Since 25 ^* Alternatively, residues may be fixed as a non-wild 

the energies of all rotamers have already been calculated, the ^^P^ residue; for example, when known site-directed 

DEE approach only requires sums over the sequence length mutagenesis techniques have shown that a particular residue 

to test and eliminate rotamers, which speeds up the calcu- ^ desirable (for example, to eliminate a proteolytic site or 

lations considerably. DEE can be rerun comparing pairs of ^^^^^ active site), the residue may be fixed as a particular 

rotamers, or combinations of rotamers, which will eventu- amino acid. Alternatively, the methods of the present inven- 

ally result in the determination of a single sequence which tion may be used to evaluate mutations de novo, as is 

represents the global optimum energy. discussed below. In an alternate preferred embodiment, a 

Once the global solution has been found, a Monte Carlo ^^^^^ position may be "floated"; the amino acid at that 

search may be done to generate a rank-ordered list of position is fixed, but different rotamers of that amino acid 

sequences in the neighborhood of the DEE solution. Starting 35 tested. In this embodiment, the variable residues may be 

at the DEE solution, random positions are changed to other l^^^t one, or anywhere from 0.1% to 99.9% of the total 

rotamers, and the new sequence energy is calculated. If the number of residues. Thus, for example, it may be possible to 

new sequence meets the criteria for acceptance, it is used as change only a few (or one) residues, or most of the residues, 

a starting point for another jump. After a predetermined ^^^^ possibilities in between. 

number of jumps, a rank-ordered list of sequences is gen- In a preferred embodiment, residues which can be fixed 

erated. In addition, as will be appreciated by those in the art, include, but are not limited to, structurally or biologically 

a Monte Carlo search may be done from a DEE run that is functional residues. For example, residues which are known 

not completed; that is, a partial DEE run that has a number to be important for biological activity, such as the residues 

of sequences may be used to generate a Monte Carlo list. which form the binding site for a binding partner (ligand/ 

As outlined in U.S. Scr. No. 09/127,926, the protein 45 receptor, antigen/antibody, etc.), phosphorylation or glyco- 

backbonc (comprising (for a naturally occurring protein) the sylation sites which are crucial to biological function, or 

nitrogen, the carbonyl carbon, the a-carbon, and the carbo- structurally important residues, such as disulfide bridges, 

nyl oxygen, along with the direction of the vector from the binding sites, critical hydrogen bonding residues, 

a-carbon to the p-carbon) may be altered prior to the residues critical for backbone conformation such as proline 

computational analysis, by varying a set of parameters 50 glycine, residues critical for packing interactions, etc. 

called supersecondary structure parameters. ni^Y fixed in a conformation or as a single rotamer, or 

Once a protein structure backbone is generated (with "floated", 

alterations, as outlined above) and input into the computer, Similarly, residues which may be chosen as variable 

explicit hydrogens are added if not included within the residues may be those that confer undesirable biological 

structure (for example, if the structure was generated by 55 attributes, such as susceptibility to proteolytic degradation, 

X-ray crystallography, hydrogens must be added). After dimerization or aggregation sites, glycosylation sites which 

hydrogen addition, energy minimizabon of the structure is n^^y lead to immune responses, unwanted binding activity, 

run, to relax the hydrogens as well as the other atoms, bond unwanted allostery, undesirable biological activity but with 

angles and bond lengths. In a preferred embodiment, this is a preservation of binding, etc. 

done by doing a number of steps of conjugate gradient 60 In a preferred embodiment, each variable position is 
minimizabon (Mayo et al., 7. P/iyi. C/iem. 94:8897 (1990)) classified as either a core, surface or boundary residue 
of atomic coordinate positions to minimize the Dreiding position, although in some cases, as explained below, the 
force field with no electrostatics. Generally from about 10 to variable position may be set to glycine to minimize back- 
about 250 steps is preferred, with about 50 being most bone strain. 

preferred. 65 jn one embodiment, only core residues are variable resi- 

llie CPA backbone structure contains at least one variable dues; alternate embodiments utilize methods for designing 

residue position. Each GPA residue that can differ from the CPA proteins containing core, boundary and surface variable 
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residues; core and surface variable residues; core and bound- methionine. The ro tamer set for each boundary position thus 
ary variable residues; surface and boundary variable resi- potentially includes every rotamer for these seventeen resi- 
dues; as well as surface variable residues alone, or boundary dues (assuming cysteine, glycine and proline are not used, 
variable residues alone. In general, preferred embodiments although they can be). Additionally, in some preferred 
do not utilize surface variable residues, as this can lead to 5 embodiments, a set of 18 naturally occurring amino acids 
undesirable antigenicity; however, in appUcations that are (^u except cysteine and proline, which are known to be 
not related to therapeutic use of the GPA proteins, it may be particularly disruptive) are used, 
desirable to alter surface residues. _ , .... . . 

I f . , . . - Itius, as will be appreciated by those m the art, there is a 

ine classincation or residue positions as core, surface or „ . 1, * 1 c - *t. ^ 

, . i_ J • , .„ computational benefit to classifying the residue positions, as 

boundary may be done m several ways^ as wdl be ap^ :o j, decreases the number of calculations. It should also be 

ated by those m the art and putlmed m W098 47089. hereby ^^^^ ^ ^^^^^^^^ ^^^^^ 

mcorporaled by reference m Us enurety. In a preferred boundary and surface residues are altered from those 

embodiment, the classification is done via a visual scan of _ . , u r i i 

• • 1 . • u lu . . 11- .t_ •. described above; for example, under some circumstances, 

the onginal protein backbone structure, including the side „ ^ . - ^u aa ^ u* * j * 

, . J • one or more amino acids is either added or subtracted from 

chains, and assigning a classification based on a subjective 15 ,1,^ „f ,ii„„,„j t?^. ^ • 

- * f.„ . . ^ c . • J 11- the set 01 allowed amino acids. For example, some proteins 

evaluation of one skilled in the art of protem modelhng. „,u-«i, j ™ • „ u ™ • u i- ju- a- 

/-..i- which dimenze or multimenze, or have ligand binding sites, 

Alternatively, a preferred embodiment utilizes an assess- r^^„ ^^»*^:„ u.,a u^u-« . i aa *- 

, c ' . ' r t ^ t ■ may contain hydrophobic surface residues, etc. In addition, 

ment of the orientation of the Ca-Cp vectors relative to a -i *u * a * n u i- « • » *». r li 

, ., 1 i- 1 . . . . residues that do not allow helix capping or the favorable 

solvent accessible surface computed using only the template :„»^,„^t:„„ „,uu « u«r„ ^* ^1 u u» * a e 

^ , ^ i: J . J • . , ... interaction with an a-nehx dipole may be subtracted from a 

Ca atoms, in a preferred embodiment, the solvent accessible on * p u a a -ru- a c r ■ j 

c c ^ .X. ^ . r u r ij • , 20 sct of allowed residues. This modification of amino acid 

suriace for only the Ca atoms of the target fold is generated * a ^ „ „^ u ^ a u ■ 

^ „ , . . * . ^ , groups IS done on a residue by residue basis, 

using the Connolly algonthm with a probe radius ranging , - , , 

from about 4 to about 12 A, with from about 6 to about 10 ^ preferred embodiment, prohne, cysteine and glycine 
A being preferred, and 8 A being particularly preferred. The '''f''^^^ »he list of possible amino acid side 
Ca radius used ranges from about 1.6 A to about 2.3 A, with ,5 ""^^T' ^""^ ^^""^^^^ rotamers for these side chains are not 
from about 1,8 to about 2.1 A being preferred, and 1.95 A However, m a preferred embodiment, when the vari- 
being especiaUy preferred. A residue is classified as a core ^^'l'^ ""^^^^^^ P«^^^'°" ^ * ^"S*^ (^^^^ ^^^^ ^^^^^^^^ ^"^^^ 
position if a) the distance for its Ca, along its Ca-Cp vector, '^^^"^'^ 1) carbonyl carbon of the preceding amino 
to the solvent accessible surface is greater than about 4-6 A, ? ^"^^^^ ^^^i^"^' ^) 
with greater than about 5.0 A being especiaUy preferred, and 30 ^-^^rbon of the current residue; and 4) the carbonyl carbon 
b) the distance for its Cp to the nearest surface point is °f '^'^'^''^l t*'^".^ ' ^^e position is set to 
greater than about 1.5-3 A, with greater than about 2.0 A S^y^"^ mimmize backbone strain, 
being especially preferred. The remaining residues are clas- Once the group of potential rotamers is assigned for each 
sified as surface positions if the sum of the distances from variable residue position, processing proceeds as outlined in 
their Ca, along their Ca-Cp vector, to the solvent accessible 35 U.S. Sen No. 09/127,926 and PCT US98/07254. This pro- 
surface, plus the distance from their Cp to the closest surface cessing step entails analyzing interactions of the rotamers 
point was less than about 2.54 A, with less than about 2.7 A with each other and with the protein backbone to generate 
being especially preferred. All remaining residues are clas- optimized protein sequences. Simplistically, the processing 
sified as boundary positions. initially comprises the use of a number of scoring functions 

Suitable core and boundary positions for GPA proteins are 40 calculate energies of interactions of the rotamers, either to 

outlined below backbone itself or other rotamers. Preferred PDA scoring 

Once each variable position is classified as either core. ^^<^^^oxis include, but are not limited to, a Van der Waals 

surface or boundary, a set of amino acid side chains, and thus P^^^"^^^^ f hy6rog^n bond potential scor- 

^setofrotamers,isassigncdtocachposition.'niatis,thcsct >ng function, an atomic solvation scoring fiinction, a sec- 

of possible amino acid side chains that the program will 45 ondary structure propensity scormg function and an electro- 

aUow to be considered at any particular position is chosen. staticsconng function. As is further described below, at least 

Subsequently, once the possible amino acid side chains are ^^""S ^ ^ ^^^^^ vosiXxon. although 

chosen, the set of rotamers that will be evaluated at a functions may differ depending on the position 

particular position can be determined. Thus, a core residue classification or other considerations like favorable inter- 

will generaUy be selected from the group of hydrophobic 50 an a-helix dipole. As outlined below, the total 

residues consisting of alanine, valine, isoleucine, leucine, ^°^^gy 'I calculations is the sum of the 

phenylalanine, tyrosine, tryptophan, and methionine (in energy of each scormg function used at a parUcular position, 

some embodiments, when the a scaling factor of the van der ^ generally shown m Equation 1: 

Waals scoring function, described below, is low, methionine £,..rn^^.+nf«+«£..w,„,+«^^+«^./ec Equation i 

is removed from the set), and the rotamer set for each core 55 

position potentially includes rotamers for these eight amino In Equation 1, the total energy is the sum of the energy of 

acid side chains (all the rotamers if a backbone independent the van der Waals potential (E^^), the energy of atomic 

library is used, and subsets if a rotamer dependent backbone solvation (E„), the energy of hydrogen bonding ifih-bondin^^ 

is used). Similarly, surface positions are generally selected the energy of secondary structure (E^^and the energy of 

from the group of hydrophilic residues consisting of alanine, 60 electrostatic interaction (E^,^J. The term n is either 0 or 1, 

serine, threonine, aspartic acid, asparagine, glutamine, depending on whether the term is to be considered for the 

glutamic acid, arginine, lysine and histidine. The rotamer set particular residue position. 

for each surface position thus includes rotamers for these ten As outlined in U.S. Ser. Nos. 60/061,097, 60/043,464, 

residues. Finally, boundary positions are generally chosen 60/054,678, 09/127,926 and PCT US98/07254, any combi- 

from alanine, serine, threonine, aspartic acid, asparagine, 65 nation of these scoring functions, either alone or in 

glutamine, glutamic acid, arginine, lysine histidine, valine, combination, may be used. Once the scoring functions to be 

isoleucine, leucine, phenylalanine, tyrosine, tryptophan, and used are identified for each variable position, the preferred 
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first step in the computational analysis comprises the deter- 
mination of the interaction of each possible rotamer with all 
or part of the remainder of the protein. That is, the energy of 
interaction, as measured by one or more of the scoring 
functions, of each possible rotamer at each variable residue 5 
position with either the backbone or other rotamers, is 
calculated. In a preferred embodiment, the interaction of 
each rotamer with the entire remainder of the protein, i.e. 
both the entire template and all other rotamers, is done. 
However, as outlined above, it is possible to only model a lo 
portion of a protein, for example a domain of a larger 
protein, and thus in some cases, not all of the protein need 
be considered. 

In a preferred embodiment, the first step of the compu- 
tational processing is done by calculating two sets of inter- is 
actions for each rotamer at every position: the interaction of 
the rotamer side chain with the template or backbone (the 
"singles" energy), and the interaction of the rotamer side 
chain with all other possible rotamers at every other position 
(the "doubles" energy), whether that position is varied or 20 
floated. It should be understood that the backbone in this 
case includes both the atoms of the protein structure 
backbone, as well as the atoms of any fixed residues, 
wherein the fixed residues are defined as a particular con- 
formation of an amino acid. 25 

ITius, "singles" (rotamer/template) energies are calculated 
for the interaction of every possible rotamer at every vari- 
able residue position with the backbone, using some or all of 
the scoring functions. Thus, for the hydrogen bonding 
scoring function, every hydrogen bonding atom of the 30 
rotamer and every hydrogen bonding atom of the backbone 
is evaluated, and the E^^ is calculated for each possible 
rotamer at every variable position. Similarly, for the van der 
Waals scoring function, every atom of the rotamer is com- 
pared to every atom of the template (generally excluding the 35 
backbone atoms of its own residue), and the E^^w is calcu- 
lated for each possible rotamer at every variable residue 
position. In addition, generally no van der Waals energy is 
calculated if the atoms are connected by three bonds or less. 
For the atomic solvation scoring function, the surface of the 40 
rotamer is measured against the surface of the template, and 
the E^^ for each possible rotamer at every variable residue 
position is calculated. The secondary structure propensity 
scoring function is also considered as a singles energy, and 
thus the total singles energy may contain an E„ term. As will 45 
be appreciated by those in the art, many of these energy 
terms will be close to zero, depending on the physical 
distance between the rotamer and the template position; that 
is, the farther apart the two moieties, the lower the energy. 

For the calculation of "doubles" energy (rotamer/ 50 
rotamer), the interaction energy of each possible rotamer is 
compared with every possible rotamer at all other variable 
residue positions. Thus, "doubles" energies are calculated 
for the interaction of every possible rotamer at every vari- 
able residue position with every possible rotamer at every 55 
other variable residue position, using some or all of the 
scoring functions. Thus, for the hydrogen bonding scoring 
function, every hydrogen bonding atom of the first rotamer 
and every hydrogen bonding atom of every possible second 
rotamer is evaluated, and the E^/^ is calculated for each 60 
possible rotamer pair for any two variable positions. 
Similarly, for the van der Waals scoring function, every atom 
of the first rotamer is compared to every atom of every 
possible second rotamer, and the Evj^vis calculated for each 
possible rotamer pair at every two variable residue positions. 65 
For the atomic solvation scoring function, the surface of the 
first rotamer is measured against the surface of every pos- 
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sible second rotamer, and the E^ for each possible rotamer 
pair at every two variable residue positions is calculated. 
The secondary structure propensity scoring function need 
not be run as a "doubles" energy, as it is considered as a 
component of the "singles" energy. As will be appreciated 
by those in the art, many of these double energy terms will 
be close to zero, depending on the physical distance between 
the first rotamer and the second rotamer; that is, the farther 
apart the two moieties, the lower the energy. 

Once the singles and doubles energies are calculated and 
stored, the next step of the computational processing may 
occur. As outlined in U.S. Ser. No. 09/127,926 and PCT 
US98/07254, preferred embodiments utilize a Dead End 
Elimination (DEE) step, and preferably a Monte Carlo step. 

The computational processing results in a set of optimized 
GPA protein sequences. These optimized GPA protein 
sequences are generally significantly different from the 
wild-type hG-CSF sequence from which the backbone was 
taken. 

Thus, in the broadest sense, the present invention is 
directed to GPA proteins that have granulopoietic activity. 
By "granulopoietic activity" or "GPA" herein is meant that 
the protein exhibits at least one, and preferably more, of the 
biological functions of a granulocyte-colony stimulating 
factor (G-CSF), as defined below. 

By "protein" herein is meant at least two covalently 
attached amino acids, which includes proteins, polypeptides, 
oligopeptides and peptides. The protein may be made up of 
naturally occurring amino acids and peptide bonds, or syn- 
thetic peptidomimetic stmctures, generally depending on the 
method of synthesis. Thus "amino acid", or "peptide 
residue", as used herein means both naturally occurring and 
synthetic amino acids. For example, homo-phenylalanine, 
citrulline and noreleucine are considered amino acids for the 
purposes of the invention. "Amino acid" also includes imino 
acid residues such as proline and hydroxyproline. The side 
chains may be in either the (R) or the (S) configuration. In 
the preferred embodiment, the amino acids are in the (S) or 
L-configuration. If non-naturally occurring side chains are 
used, non-amino acid substituents may be used, for example 
to prevent or retard in vivo degradations. Proteins including 
non-naturally occurring amino acids may be synthesized or 
in some cases, made recombinantly; see van Hest et al., 
FEBS Lett 428:(l-2) 68-70 May 22, 1998 and Tang et al., 
Abstr. Pap Am. Chcm. S218:U138-U138 Part 2 Aug. 22, 
1999, both of which are expressly incorporated by reference 
herein. 

The GPA proteins of the invention exhibit at least one 
biological function of a G-CSF. By "granulocyte colony 
stimulating factor** or "G-CSF" herein is meant a wild type 
G-CSF. The G-CSF may be from any number of organisms, 
with G-CSFs from mammals being particularly preferred. 
Suitable mammals include, but are not limited to, rodents 
(rats, mice, hamsters, guinea pigs, etc.), primates, farm 
animals (including sheep, goats, pigs, cows, horses, etc) and 
in the most preferred embodiment, from humans (this is 
sometimes referred to herein as hG-CSF, the sequence of 
which is depicted in FIG, 1). As will be appreciated by those 
in the art, GPAs based on G-CSFs from mammals other than 
humans may find use in animal models of human disease. 
The GI numbers for a variety of mammalian species is as 
follows: bovine 442671; dog 442673; sheep 310382; cat 
CAA69853; pig 2411469; mouse 309248; rat 1680659. 

The GPA proteins of the invention exhibit at least one 
biological function of a G-CSF. By "biological function" or 
"biological property" herein is meant any one of the prop- 
erties or functions of a G-CSF, including, but not limited lo, 
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the ability to stimulate cell proliferation, particularly of 
hematcpoetic stem cells to produce granulocytes and par- 
ticularly neutrophils; the ability to treat severe chronic 
neutropenia; the use in harvesting peripheral blood progeni- 
tor cells; the ability to enhance bone marrow transplantation 
therapy; as well as the stimulation of CFU -Gm type cells. 

la a preferred embodiment, the biological function is 
granulopoietic activity (GPA). CPA is defined as the ability 
of the compound to stimulate cells that have a G-CSF 
receptor to proliferate. However, in some embodiments, 
GPA proteins may not possess GPA activity. 

In a preferred embodiment, the assay system used to 
determine GPA is an in-vitro system as described in the 
examples, using Ba/F3 cells stably Iransfected with the gene 
encoding the human Class 1 G-CSF receptor; see Young et 
al. Protein Sci. 6:1228-1236 (1997), hereby expressly incor- 
porated by reference in its entirety. In this system, cell 
proliferation is measured as a function of BrdU 
incorporation, which is incorporated into the nucleic acid of 
the proliferating cells. An increase above background of at 
least about 20%, with at least about 50% being preferred and 
at least about 100%, 500% and 1000% being especially 
preferred is an indication of GPA. An alternative assay is the 
CFU-GM cell assay as described in Zscbo et al, Immuno- 
biology 172:175-184 (1986), also expressly incorporated by 
reference in its entirety. 

In a preferred embodiment, an in-vivo system can be used 
to assay for GPA. For example, a suitable system is as 
described in U.S. Pat. No. 4,999,291, hereby incorporated 
by reference in its entirety. In general, in vivo assays require 
the administration of the GPA protein (or, in the case of gene 
therapy, of the GPA nucleic acid) to a suitable animal, 
followed by monitoring of the granulocyte count (or in some 
cases monitoring lymphocytes can be done) of the animal. In 
general, increases in neutrophil, granulocyte or lymphocyte 
counts without corresponding erythrocyte counts is indica- 
tive of G-CSF. Similariy, a useful in vivo assay system is as 
follows: male c57BL/6N mice are rendered neutropenic with 
a single intraperitoneal injection of 200 mg/kg cyclophos- 
phamide (CPA). Beginning 24 hrs later and for 4 consecutive 
days from the day after the dosing with CPA, the mice are 
given a daily intravenous injection of 100 ug/kg of rhG-CSF, 
novel granulopoietic protein, or control vehicle. Granulopoi- 
etic activity is assayed on day 5 by bleeding the mice 
retro-orbitally and counting the number of white blood cells 
and polymorphonuclear neutrophils. See Hattori et al., 
Blood 75:1228-1233 (1990), expressly incorporated by ref- 
erence in its entirety. 

In a preferred embodiment, the antigenic profile in the 
host animal of the GPA protein is similar, and preferably 
identical, to the antigenic profile of the host G-CSF; that is, 
the GPA protein does not significantly stimulate the host 
organism (e.g. the patient) to an immune response; that is, 
any immune response is not clinically relevant and there is 
no allergic response or neutralization of the protein by an 
antibody. That is, in a preferred embodiment, the GPA 
protein does not contain additional or different epitopes from 
the G-CSF, By "epitope" or "determinant" herein is meant a 
portion of a protein which will generate and/or bind an 
antibody. Thus, in most instances, no significant amount of 
antibodies are generated to a GPA protein. In general, this is 
accomplished by not significantly altering surface residues, 
as outlined below nor by adding any amino acid residues on 
the surface which can become glycosylated, as novel gly- 
cosylation can resuh in an immune response. 

'Ihc GPA proteins and nucleic acids of the invention are 
distinguishable from naturally occurring G-CSFs. A "natu- 
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rally occurring G-CSF" is one that exists in nature and 
includes allelic variations; a representative sequence is the 
human sequence (hG-CSF) shown in FIG. 1. It should be 
noted that unless otherwise stated, all positional numbering 

5 is based on this human G-CSF sequence. That is, as will be 
appreciated by those in the art, an ahgnment of G-CSF 
proteins and GPA proteins can be done using standard 
programs, as is outlined below, with the identification of 
"equivalent" positions between the two proteins. Thus, the 

10 GPA proteins and nucleic acids of the invention are non- 
naturally occurring; that is, they do not exist in nature. 

Thus, in a preferred embodiment, the GPA protein has an 
amino acid sequence that differs from a wild-type G-CSF 
sequence by at least 3% of the residues. ITiat is, the GPA 

15 proteins of the invention are less than about 97% identical to 
a G-CSF amino acid sequence. Accordingly, a protein is a 
"GPA protein" if the overall homology of the protein 
sequence to the amino acid sequence shown in FIG. 1 is 
preferably less than about 97%, more preferably less than 

20 about 95%, even more preferably less than about 90% and 
most preferably less than 85%. In some embodiments the 
homology will be as low as about 75 to 80%. Stated 
differently, based on the hG-CSF sequence of 174 residues, 
GPA proteins have at least about 5 residues that differ from 

25 the hG-CSF sequence (3%), with GPA proteins having from 
5 residues to upwards of 30 residues being different from the 
hG-CSF sequence. In some instances, GPA proteins have 3 
or 4 different residues from the hG-CSF sequence. Preferred 
GPA proteins have 10-24 different residues with from about 

30 10 to about 14 being particularly preferred (that is, 6-8% of 
the protein is not identical to hG-CSF). 

Homology in this context means sequence similarity or 
identity, with identity being preferred. As is known in the art, 
a number of different programs can be used to identify 

35 whether a protein (or nucleic acid as discussed below) has 
sequence identity or similarity to a known sequence. 
Sequence identity and/or similarity is determined using 
standard techniques known in the art, including, but not 
limited to, the local sequence identity algorithm of Smith & 

40 Waterman, At/v. AppL Math., 2:482 (1981), by the sequence 
identity alignment algorithm of Needleman & Wunsch, J. 
Mol Biol, 48:443 (1970), by the search for similarity 
method of Pearson & Lip man, Proc. Natl Acad. Sci. U.SA., 
85:2444 (1988), by computerized implementations of these 

45 algorithms (GAP, BESTFIT, FASTA, and TFASTA in the 
Wisconsin Genetics Software Package, Genetics Computer 
Group, 575 Science Drive, Madison, Wis.), the Best Fit 
sequence program described by Devereux et dX.jNucl. Acid 
Res., 12:387-395 (1984), preferably using the default 

50 settings, or by inspection. Preferably, percent identity is 
calculated by FastDB based upon the following parameters: 
mismatch penalty of 1; gap penalty of 1; gap size penalty of 
033; and joining penalty of 30, "Current Methods in 
Sequence Comparison and Analysis," Macromolecule 

55 Sequencing and Synthesis, Selected Methods and 
Applications, pp 127-149 (1988), Alan R. Liss, Inc. 

An example of a useful algorithm is PILEUP. PILEUP 
creates a multiple sequence alignment from a group of 
related sequences using progressive, pairwise alignments. It 

60 can also plot a tree showing the clustering relationships used 
to create the alignment. PILEUP uses a simplification of the 
progressive alignment method of Feng & Doolittle, J. Mol. 
Evol. 35:351-360 (1987); the method is similar to that 
described by Higgins & Sharp CABIOS 5:151-153 (1989). 

65 Useful PILEUP parameters including a default gap weight of 
3.00, a default gap length weight of 0.10, and weighted end 
gaps. 
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Another example of a useful algorithm is the BLAST FIG. 1. Thus, in a preferred embodiment, included within the 

algorithm, described in Altschul et al., J. Mol Biol, 215, definition of GPA proteins are portions or fragments of the 

40S-410, (1990) and Karlin et aL, Proc. Natl Acad. Sci. sequences depicted herein. Fragments of GPA proteins are 

67,5^., 90:5873-5787 (1993). A particularly useful BLAST considered GPA proteins if a) they share at least one 

program is the WU-BLAST-2 program which was obtained 5 antigenic epitope; b) have at least the indicated homology; 

from Altschul et al.. Methods in Enzymology, 266:460-480 and preferably have GPA biological activity as defined 

(1996); http://blast.wustl/edu/blast/README.html]. herein. 

WU-BLAST-2 uses several search parameters, most of * preferred embodiment, as is more fully outlined 

which are set to the default values. The adjustable param- proteins mclude further amino acid 

eters are set with the following values: overlap span=l, lo ^^"^tions as compared to a wild-type G-CSF, than those 

overlap fraction «0.125, word threshold (T)»ll. The HSP S ^T"^'}"!, u'*'*'!'''"' as outlined herein, any of the 

uon CO ™ * J J • t J * u vanations depicted herein may be combined in any way to 

and HSP S2 parameters are dynamic values and are estab- ^^^^ ^^^-^-^ » ^^^^j /^^^^^ 

hshed by he program itself depending upon the composition ^^dition, GPA proteins can be made that are longer than 
of the particular sequence and composition of the particular j^ose depicted in the figures, for example, by the addition of 
database against which the sequence of interest is being 15 epitope or purification tags, as outlined herein, the addition 
searched; however, the values may be adjusted to mcrease of other fusion sequences, etc. For example, the GPA pro- 
sensitivity, teins of the invention may be fused to other therapeutic 

An additional useful algorithm is gapped BLAST as proteins such as lL-11 or to other proteins such as Fc or 

reported by Altschul et al.,A^wc/.Ac/d:s/?e5., 25:3389-3402. serum albumin for pharmacokinetic purposes. See for 

Gapped BLAST uses BLOSUM-62 substitution scores; 20 example U.S. Pat. Nos. 5,766,883 and 5,876,969, both of 

threshold T parameter set to 9; the two-hit method to trigger which are expressly incorporated by reference, 

ungapped extensions; charges gap lengths of k a cost of In a preferred embodiment, the GPA proteins comprise 

10+k; X„ set to 16, and set to 40 for database search stage variable residues in core and boundary residues, 

and to 67 for the output stage of the algorithms. Gapped hG-CSF core residues are as follows: positions 17, 21, 24, 

alignments are triggered by a score corresponding to -22 25 28, 31, 35, 41, 47, 54, 56, 75, 78, 82, 85, 88, 89, 92, 95, 99, 

bits. 103, 106, 110, 113, 114, 117, 140, 149, 150, 151, 152, 153, 

A % amino acid sequence identity value is determined by 154, 157, 160, 161 and 168. Accordingly, in a preferred 

the number of matching identical residues divided by the embodiment, GPA proteins have variable positions selected 

total number of residues of the "longer" sequence in the from these positions. 

aligned region. The "longer" sequence is the one having the 30 In a preferred embodiment, GPA proteins have variable 

most actual residues in the aligned region (gaps introduced positions selected solely from core residues of hG-CSF. 

by WU-Blast-2 to maximize the alignment score are Alternatively, at least a majority (51%) of the variable 

ignored). positions are selected firom core residues, with at least about 

In a similar manner, "percent (%) nucleic acid sequence 75% of the variable positions being preferably selected from 

identity" with respect to the coding sequence of the polypep- 35 core residue positions, and at least about 90% of the variable 

tides identified herein is defined as the percentage of nucle- positions being particularly preferred. A specifically pre- 

otide residues in a candidate sequence that are identical with ferred embodiment has only core variable positions altered 

the nucleotide residues in the coding sequence of the cell as compared to hG-CSF. 

cycle protein. A preferred method utilizes the BLASTN Particularly preferred embodiments where GPA proteins 

module of WU-BLAST-2 set to the default parameters, with 40 have variable core positions as compared to hG-CSF are 

overlap span and overlap fraction set to 1 and 0.125, shown in the Figures. 

respectively. In one embodiment, the variable core positions are altered 

The alignment may include the introduction of gaps in the to any of the other 19 amino acids. In a preferred 

sequences to be aligned. In addition, for sequences which embodiment, the variable core residues are chosen from Ala, 

contain either more or fewer amino acids than the protein 45 Val, Phe, He, Leu, Tyr and Trp. hG-CSF boundary residues 

encoded by the sequence of FIG. 1, it is understood that in are as follows: positions 14, 20, 27, 32, 34, 38, 77, 79, 84, 

one embodiment, the percentage of sequence identity will be 91, 99, 102, 107, 109, 116, 120, 145, 146, 147, 155, 156, 164 

determined based on the number of identical amino acids in and 170. Accordingly, in a preferred embodiment, GPA 

relation to the total number of amino acids. Thus, for proteins have variable positions selected from these posi- 

example, sequence identity of sequences shorter than that 50 tions. 

shown in FIG. 1, as discussed below, will be determined In a preferred embodiment, the boundary core positions 

using the number of amino acids in the shorter sequence, in are altered to any of the other 19 amino acids. In a preferred 

one embodiment. In percent identity calculations relative embodiment, the variable boundary residues are chose from 

weight is not assigned lo various manifestations of sequence Ala, Val, Leu, He, Asp, Asn, Glu, Gin, Lys, Ser, ITir and His 

variation, such as, insertions, deletions, substitutions, etc. 55 (preferably protonated His). 

In one embodiment, only identities are scored positively In a preferred embodiment, the GPA protein of the inven- 

(+1) and all forms of sequence variation including gaps are tion has a sequence that differs from a wild-type G-CSF 

assigned a value of "0", which obviates the need for a protein in at least one amino acid position selected from 

weighted scale or parameters as described below for position 14, 17, 20, 21, 24, 27, 28, 31, 32, 34, 38, 78, 79, 85, 

sequence similarity calculations. Percent sequence identity 60 89, 91, 99, 102, 103, 107, 109, 110, 113, 116, 120, 145, 146, 

can be calculated, for example, by dividing the number of 147, 148, 151, 153, 155, 156, 157, 160, 161, 164, 168 and 

matching identical residues by the total number of residues 170; see also FIG. 2 which outlines sets of amino acid 

of the "shorter*' sequence in the aligned region and multi- positions. 

plying by 100. The "longer" sequence is the one having the Preferred amino acids for each position, including the 

most actual residues in the aligned region. 65 hG-CSF residue, are shown in FIGS. 3-10. Thus, for 

'Hius, GPA proteins of the present invention may be example, at position 17, preferred amino acids are Leu, Val 

shorter or longer than the amino acid sequence shown in and He; al position 21, Val, He, Phe, Ala, and Tyr; etc. 
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Preferred changes are as follows: Leul4Ile; CyslTAIa; degeneracy in the genetic code and codon bias of different 

Cysl7Leu; Cysl7Ile; Gln20Leu; Val2Ile; Val21Ala; organisms. Accordingly, the nucleic acid sequence homol- 

Val21Phe; Val21Tyr; Ile24Ala; Ile24Val; Ile24Leu; ogy may be either lower or higher than that of the protein 

Asp27Glu; Asp27Ser; Gly28Ala; Gly28Leu; Leu31Val; sequence, with lower homology being preferred. 
Gln32Leu; Gln32Val; Gln32Ile; Lys34GIu; Lys34Gln; 5 In a preferred embodiment, an GPA nucleic acid encodes 

Lys35Ile; Lys35Val; Thr38His; Thr38Val; Thr38Ile; an CPA protein. As will be appreciated by those in the art, 

Thr38Glu; Thr38Lys; Leu78Phe; Leu78Ala; Lcu78Val; due to the degeneracy of the genetic code, an extremely large 

Leu78Ile; Leu78Tyr; His79Leu; Leu82Ala; Leu82Phe; number of nucleic acids may be made, all of which encode 

Tyr85Val; Tyr85Ile; Tyr85Phe; Tyr85Trp; Leu89Phe; the GPA proteins of the present invention. Thus, having 
Leu89Trp; Ala91Lys; Leu92Phe; Leu99Glu; Thrl02Lys; lo identified a particular amino acid sequence, those skilled in 

Thrl02Val; Thrl02Leu; Thrl02Ile Thrl02Glu; Thrl02Gln; the art could make any number of different nucleic acids by 

Leul03Vai; Leul03Ile; Leul03Ala; LeulOCVal; Glnl07Ile; simply modifying the sequence of one or more codons in a 

Glnl07Val; Glnl07Leu; Vall09Glu; Vall09Asp; way which does not change the amino acid sequence of the 

Vall09Gln; ValUOAla; VatUOLeu; ValUOIle; Phell3Ala; GPA. 

Phell3Leu; Thrll6Ile; Thrll6Val; Thrll6Leu; Thrll6Glu; is In one embodiment, the nucleic acid homology is deter- 

Thrll6Ala; Ilell7Val; Ilell7Leu; Ilell7Phe; Ilell7Trp; mined through hybridization studies. Thus, for example, 

Glnl20Leu; Glnl45Glu; Argl46Lys; Argl46Gln; nucleic acids which hybridize under high stringency to the 

Argl47Glu; Argl47Lys; Alal48Asp; Alal48Thr; VallSlIle; nucleic acid sequences shown in RG. 1 or its complement 

Vall53Ile; Serl55Ile; Hisl56Leu; Leul57Ala; Leul57Val; and encode a GPA protein is considered an GPA gene. 
Leul57Ile; Phel60Trp; Leul61Phe; Serl64Ala; Leul57Ile; 20 High stringency conditions are known in the art; see for 

Phel60Trp; Leul61Phe; Leul61Ala; Leul61 Val; example Maniatis et al., Molecular Cloning: A Uboratory 

Vall67Ala; Leul68Phe; Hisl70Asp; Hisl70Leu; Manual, 2d Edition, 1989, and Short Protocols in Molecular 

Hisl70Glu; Hisl70Ghi; and Hisl70Lys. These may be done Biology, ed. Ausubel, et al., both of which are hereby 

either individually or in combination, with any combination incorporated by reference. Stringent conditions are 
being possible. However, as outlined herein, preferred 25 sequence-dependent and will be different in different cir- 

embodiments utilize at least four, and preferably more, cumstances. Longer sequences hybridize specifically at 

variable positions in each GPA protein. higher temperatures. An extensive guide to the hybridization 

Particularly preferred sequences are selected from the of nucleic acids is found in Tijssen, Techniques in Biochem- 

group consisting of: C17L, G28A, L78F, Y85F, L103V, istry and Molecular Biology— Hybridization with Nucleic 
VllOI, F113L, V151I, V153I and L168F, SEQ ID NO: 7; 30 Acid Probes, "Overview of principles of hybridization and 

andL14I,Q20L,D27E,Q32L,K34E,T38H,H79L,A91K, the strategy of nucleic acid assays" (1993). Generally 

T102K, Q107I, D109E, T116I, Q120L, R146K, R147E, stringent conditions are selected to be about 5-10° C. lower 

A148D, S155I, H156L, S163A, SEQ ID NO: 18. than the thermal melting point (TJ for the specific sequence 

In a preferred embodiment, the GPA proteins do not have at a defined ionic strength and pH. The T^ is the temperature 

sole single variable positions at positions 17, 24, 35, 41, 18, 35 (under defined ionic strength, pH and nucleic acid 

68, 26, 174, 170, 167, 44, 47, 23, 20, 28, 127, 138, 13, 121 concentration) at which 50% of the probes complementary 

or 124. Similarly, preferred embodiments of GPA proteins to the target hybridize to the target sequence at equilibrium 

do not only have two variable positions at 127 and 138 or 37 (as the target sequences are present in excess, at T„, 50% of 

and 43. In a preferred embodiment, the GPA proteins do not the probes are occupied at equilibrium). Stringent conditions 

have only three variable positions at 17, 24 and 41; 17, 24 40 will be those in which the salt concentration is less than 

and 35; and 17, 35 and 41. Furthermore, preferred GPA about 1.0 M sodium ion, typically about 0.01 to 1.0 M 

proteins doe not have only four variable positions at 17, 24, sodium ion concentration (or other salts) at pH 7.0 to 8.3 and 

35 and 41. the temperature is at least about 30° C. for short probes (e.g. 

In a preferred embodiment, the GPA proteins of the 10 to 50 nucleotides) and at least about 60° C. for long 

invention are hG-CSF conformers. By "conformcr" herein is 45 probes (e.g. greater than 50 nucleotides). Stringent condi- 

meant a protein that has a protein backbone 3D structure that tions may also be achieved with the addition of destabilizing 

is virtually the same but has significant differences in the agents such as formamide. 

amino acid side chains. That is, the GPA proteins of the In another embodiment, less stringent hybridization con- 
invention define a conformer set, wherein all of the proteins ditions are used; for example, moderate or low stringency 
of the set share a backbone structure and yet have sequences 50 conditions may be used, as are known in the art; see Maniatis 
that differ by at least 3-5%. "Backbone" in this context and Ausubel, supra, and Tijssen, supra, 
means the non-side chain atoms: the nitrogen, carbonyl The GPA proteins and nucleic acids of the present inven- 
carbon and oxygen, and the a-carbon, and the hydrogens tion are recombinant. As used herein, "nucleic acid" may 
attached to the nitrogen and a-carbon. To be considered a refer to either DNA or RNA, or molecules which contain 
conformer, a protein must have backbone atoms that are no 55 both deoxy- and ribonucleotides. The nucleic acids include 
more than 2 A from the hG-CSF structure, with no more than genomic DNA, cDNA and oligonucleotides including sense 
L5 A being preferred, and no more than 1 A being particu- and anti-sense nucleic acids. Such nucleic acids may also 
larly preferred. In general, these distances may be deter- contain modifications in the ribosc-phosphate backbone to 
mined in two ways. In one embodiment, each potential increase stability and half life of such molecules in physi- 
conformer is crystallized and its three dimensional structure 60 ological environments. 

determined. Alternatively, as the former is quite tedious, the The nucleic acid may be double stranded, single stranded, 

sequence of each potential conformer is run in the PDA or contain portions of both double stranded or single 

program to determine whether it is a conformer. stranded sequence. As will be appreciated by those in the art, 

GPA proteins may also be identified as being encoded by the depiction of a single strand ("Watson") also defines the 

GPA nucleic acids. In the case of the nucleic acid, the overall 65 sequence of the other strand ("Crick"); thus the sequence 

homology of the nucleic acid sequence is commensurate depicted in FIG. 1 also includes the complement of the 

with amino acid homology but takes into account the sequence. By the term "recombinant nucleic acid"herein is 
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meant nucleic acid, originally formed in vitro, in general, by 
the manipulation of nucleic acid by endonucleases, in a form 
not normally found in nature. Thus an isolated GPA nucleic 
acid, in a linear form, or an expression vector formed in vitro 
by ligating DNA molecules that are not normally joined, are 5 
both considered recombinant for the purposes of this inven- 
tion. It is understood that once a recombinant nucleic acid is 
made and reintroduced into a host cell or organism, it will 
replicate non-recombinantly, i.e. using the in vivo cellular 
machinery of the host cell rather than in vitro manipulations; lO 
however, such nucleic acids, once produced recombinantly, 
although subsequently replicated non-recombinantly, are 
still considered recombinant for the purposes of the inven- 
tion. 

Similarly, a "recombinant protein" is a protein made using 15 
recombinant techniques, i.e. through the expression of a 
recombinant nucleic acid as depicted above. A recombinant 
protein is distinguished from naturally occurring protein by 
at least one or more characteristics. For example, the protein 
may be isolated or piu-ified away from some or all of the 20 
proteins and compounds with which it is normally associ- 
ated in its wild type host, and thus may be substantially pure. 
For example, an isolated protein is unaccompanied by at 
least some of the material with which it is normally asso- 
ciated in its natural state, preferably constituting at least 25 
about 0,5%, more preferably at least about 5% by weight of 
the total protein in a given sample, A substantially pure 
protein comprLses at least about 75% by weight of the total 
protein, with at least about 80% being preferred, and at least 
about 90% being particularly preferred. The definition 30 
includes the production of an GPA protein from one organ- 
ism in a different organism or host cell. Alternatively, the 
protein may be made at a significantly higher concentration 
than is normally seen, through the use of a inducible 
promoter or high expression promoter, such that the protein 35 
is made at increased concentration levels. Furthermore, all 
of the GEA proteins outlined herein are in a form not 
normally found in nature, as they contain amino acid 
substitutions, insertions and deletions, with substitutions 
being preferred, as discussed below. 40 

Also included within the definition of GPA proteins of the 
present invention are amino acid sequence variants of the 
GPA sequences outlined herein and shown in the Figures. 
That is, the GPA proteins may contain additional variable 
positions as compared to hG-CSF. These variants fall into 45 
one or more of three classes: substitutional, inserlional or 
deletional variants. These variants ordinarily are prepared by 
site specific mutagenesis of nucleotides in the DNA encod- 
ing a GPA protein, using cassette or PGR mutagenesis or 
other techniques well known in the art, to produce DNA 50 
encoding the variant, and thereafter expressing the DNA in 
recombinant cell culture as outlined above. However, vari- 
ant GPA protein fragments having up to about 100-150 
residues may be prepared by in vitro synthesis using estab- 
lished techniques. Amino acid sequence variants are char- 55 
acterized by the predetermined nature of the variation, a 
feature that sets them apart from naturally occuaing allelic 
or interspecies variation of the GPA protein amino acid 
sequence. The variants typically exhibit the same qualitative 
biological activity as the namrally occurring analogue, 60 
although variants can also be selected which have modified 
characteristics as will be more fiilly outlined below. 

While the site or region for introducing an amino acid 
sequence variation is predetermined, the mutation per se 
need not be predetermined. For example, in order to opli- 65 
mize the performance of a mutation at a given site, random 
mutagenesis may be conducted at the target codon or region 
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and the expressed GPA variants screened for the optimal 
combination of desired activity. Techniques for making 
substitution mutations at predetermined sites in DNA having 
a known sequence are well known, for example, M13 primer 
mutagenesis and PCR mutagenesis. Screening of the 
mutants is done using assays of GPA protein activities. 

Amino acid substitutions are typically of single residues; 
insertions usually will be on the order of from about 1 to 20 
amino acids, although considerably larger insertions may be 
tolerated. Deletions range from about 1 to about 20 residues, 
although in some cases deletions may be much larger. 

Substitutions, deletions, insertions or any combination 
thereof may be used to arrive at a final derivative. Generally 
these changes are done on a few amino acids to minimize the 
alteration of the molecule. However, larger changes may be 
tolerated in certain circumstances. When small alterations in 
the characteristics of the GPA protein are desired, substitu- 
tions are generally made in accordance with the following 
chart: 





Chart 1 


Original Residue 


Exemplary Substitutions 


Ala 


Ser 


Arg 


Lys 


Asn 


Gin, His 


Asp 


Glu 


Cys 


Ser, Ala 


Gin 


Asn 


Glu 


Asp 


Gly 


Pro 


His 


Asn, Gin 


He 


Leu, Val 


Leu 


ne, Val 


Lys 


Arg, Gin, Glu 


Met 


Leu, He 


Phe 


Met, Leu, Tyr 


Ser 


Thr 


Thr 


Ser 


Trp 


lyr 


Tyr 


Trp, Phe 


Val 


He, Leu 



Substantial changes in function or immunological identity 
are made by selecting substitutions that are less conservative 
than those shown in Chart I. For example, substitutions may 
be made which more significantly affect: the structure of the 
polypeptide backbone in the area of the alteration, for 
example the alpha-helical or beta-sheet structure; the charge 
or hydrophobicity of the molecule at the target site; or the 
bulk of the side chain. The substitutions which in general are 
expected to produce the greatest changes in the polypep- 
tide's properties are those in which (a) a hydrophilic residue, 
e.g. seryl or threonyl, is substituted for (or by) a hydrophobic 
residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl; 
(b) a cysteine or proline is substituted for (or by) any other 
residue; (c) a residue having an electropositive side chain, 
e,g. lysyl, argjnyl, or histidyl, is substituted for (or by) an 
electronegative residue, e.g. glutamyl or aspartyl; or (d) a 
residue having a bulky side chain, e.g. phenylalanine, is 
substituted for (or by) one not having a side chain, e.g. 
glycine. 

The variants typically exhibit the same qualitative bio- 
logical activity and will elicit the same immune response as 
the original GPA protein, although variants also are selected 
to modify the characteristics of the GPA proteins as needed. 
Alternatively, the variant may be designed such that the 
biological activity of the GPA protein is altered. For 
example, glycosylation sites may be altered or removed. 
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Similarly, the biological function aiay be altered; for have two replication systems, thus allowing it to be main- 
example, in some instances it may be desirable to have more tained in two organisms, for example in mammalian or 
granulopoietic activity. insect cells for expression and in a procaryotic host for 

The CPA proteins and nucleic acids of the invention can cloning and amplification. Furthermore, for integrating 
be made m a number of ways. As wdl be appreciated by 5 expression vectors, the expression vector contains at leasl 

tose in he art, it is possible to synthesize protems using ^ homologous to the host cell genome, and 

standard techniques well known in the ar . See for examp e r^r^f^rJuu, ^™ 1 i.- i. a ^ x. 

Wilken et al., Curr. Opin. Biotechnol. 9:412-26 (1998). ^''^^''^^^ two homologous sequences which flank the 

hereby expressly incorporated by reference. ^ expression construct, l^e mtegrating vector may be directed 

Alternatively, and preferably, the proteins and nucleic specific locus m the host cell by selecting the appro- 

acids of the invention are made using recombinant tech- ^^^^^ homologous sequence for mclusion in the vector, 

niques. Using the nucleic acids of the present invention Constructs for integratmg vectors are well known in the art. 

which encode a GPA protein, a variety of expression vectors addition, in a preferred embodiment, the expression 

are made. The expression vectors may be either self- vector contains a selectable marker gene to allow the selec- 

replicating extrachromosomal vectors or vectors which inte- tran.sformed host cells. Selection genes are well 
grate into a host genome. Generally, these expression vec- ^5 known in the art and will vary with the host cell used, 

tors include transcriptional and translational regulatory A preferred expression vector system is a retroviral vector 

nucleic acid operably linked to the nucleic acid encoding the system such as is generally described in PCTAJS97/01019 

GPA protein. The term "control sequences" refers to DNA and PCTAJS97/01048, both of which are hereby expressly 

sequences necessary for the expression of an operably linked incorporated by reference. 

coding sequence in a particular host organism. 20 The GPA nucleic acids are introduced into the cells. By 

The control sequences that are suitable for prokaryotes, "introduced into " or grammatical equivalents herein is 

for example, include a promoter, optionally an operator meant that the nucleic acids enter the cells in a manner 

sequence, and a ribosome binding site. Eukaryotc cells are suitable for subsequent expression of the nucleic acid. The 

known to utilize promoters, polyadenylation signals, and method of introduction is laigely dictated by the targeted cell 
enhancers. 25 type, discussed below. Exemplary methods include CaP04 

Nucleic acid is "operably linked" when it is placed into a precipitation, liposome fusion, lipofectin®, electroporation, 

functional relationship with another nucleic acid sequence. viral infection, etc. The GPA nucleic acids may stably 

For example, DNA for a presequence or secretory leader is integrate into the genome of the host cell (for example, with 

operably linked to DNA for a polypeptide if it is expressed retroviral introduction, outlined below), or may exist either 
as a preprotein that participates in the secretion of the 30 transiently or stably in the cytoplasm (i.e. through the use of 

polypeptide; a promoter or enhancer is operably linked to a traditional plasmids, utilizing standard regulatory 

coding sequence if it affects the transcription of the sequences, selection markers, etc.). 

sequence; or a ribosome binding site is operably linked to a The GPA proteins of the present invention are produced 

coding sequence if it is positioned so as to facilitate trans- by culturing a host cell transformed with an expression 

lation. Generally, "operably linked" means that the DNA 35 vector containing nucleic acid encoding a GPA protein, 

sequences being linked are contiguous, and, in the case of a under the appropriate conditions to induce or cause expres- 

secretory leader, contiguous and in reading phase. However, sion of the GPA protein. The conditions appropriate for GPA 

enhancers do not have to be contiguous. Linking is accom- protein expression will vary with the choice of the expres- 

plished by ligation at convenient restriction sites. If such sion vector and the host cell, and will be easily ascertained 

sites do not exist, the synthetic oligonucleotide adaptors or 40 by one skilled in the art through routine experimentation, 

linkers are used in accordance with conventional practice. For example, the use of constitutive promoters in the expres- 

The transcriptional and translational regulatory nucleic acid sion vector will require optimizing the growth and prolif- 

will generally be appropriate to the host cell used to express eration of the host cell, while the use of an inducible 

the fusion protein; for example, transcriptional and transla- promoter requires the appropriate growth conditions for 

tional regulatory nucleic acid sequences from Bacillus arc 45 induction. In addition, in some embodiments, the timing of 

preferably used to express the fusion protein in Bacillus. the harvest is important. For example, the baculoviral sys- 

Numerous types of appropriate expression vectors, and terns used in insect cell expression are lytic viruses, and thus 

suitable regulatory sequences are known in the art for a harvest time selection can be crucial for product yield, 

variety of host cells. Appropriate host cells include yeast, bacteria. 

In general, the transcriptional and translational regulatory 50 archebacteria, fimgi, and insect and animal cells, including 

sequences may include, but are not limited to, promoter mammalian cells. Of particular interest are Drosophila 

sequences, ribosomal binding sites, transcriptional start and melcmgaster cells, Saccharomyces cerevisiae and other 

stop sequences, translational start and stop sequences, and yeasts, E. coli. Bacillus subtilis, SF9 cells, C129 cells, 293 

enhancer or activator sequences. In a preferred embodiment, cells, Neurospora, BHK, CHO, COS, Piclua Pastoris, etc. 

the regulatory sequences include a promoter and transcrip- 55 In a preferred embodiment, the GPA proteins are 

tional start and stop sequences. expressed in mammalian cells. MammaUan expression sys- 

Promoter sequences encode either constitutive or indue- terns are also known in the art, and include retroviral 

ible promoters. The promoters may be either naturally systems. A mammalian promoter is any DNA sequence 

occurring promoters or hybrid promoters. Hybrid promoters, capable of binding mammalian RNA polymerase and initi- 

which combine elements of more than one promoter, are also 60 ating the downstream (3') transcription of a coding sequence 

known in the art, and are useful in the present invention. In for the fusion protein into mRNA. A promoter will have a 

a preferred embodiment, the promoters are strong transcription initiating region, which is usually placed proxi- 

promolers, allowing high expression in cells, particularly mal to the 5' end of the coding sequence, and a TATA box, 

mammalian cells, such as the CMV promoter, particularly in using a located 25-30 base pairs upstream of the transcrip- 

combination with a Tet regulatory element. 65 lion initiation site. The TATA box is thought to direct RNA 

In addition, the expression vector may comprise addi- polymerase II to begin RNA synthesis at the correct site. A 

tional elements. For example, the expression vector may mammalian promoter will also contain an upstream pro- 
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moter element (enhancer element), typically located within positions are encoded by a single oligonucleotide. The 

100 to 200 base pairs upstream of the TATA box. An annealed regions are the ones that remain constant, i.e. have 

upstream promoter element determines the rate at which the sequence of the reference sequence, 

transcription is initiated and can act in either orientation. Of Oligonucleotides with insertions or deletions of codons 
particular use as mammalian promoters are the promoters 5 can be used to create a library expressing different length 

from mammalian viral genes, since the viral genes are often proteins. In particular computational sequence screening for 

highly expressed and have a broad host range. Examples insertions or deletions can result in secondary libraries 

include the SV40 early promoter, mouse mammary tumor defining different length proteins, which can be expressed by 

virus LTR promoter, adenovirus major late promoter, herpes ^ ^^^^^ pooled oligonucleotide of different lengths, 
simplex virus promoter, and the CMV promoter. lO ^° ^ preferred embodiment, error-prone PCR is done. See 

Typically, transcription termination and polyadenylation ^'S' 5.605,793, 5,811,238, and 5,830,721, all of 

sequences recognized by mammalian cells are regulatory T '^k incorporated by reference. Tliis can be 

regions located 3' to the translation stop codon and thus, 171^/^^ ?^f!!.H ''^^ '° members of the GPA 

together with the promoter elements, flank the coding f,* 'nr^nnnH ^ /h ' t f ""^T"^ 

^J^,.^^..^ -ru IP* • . r.vrA • r sequence tound m the computational screen Can be svnthc- 

sequence. TTie3' termmusof the mature mRNAis form is sized. Error prone PCR is then performed on the o^imal 

sue-specific post-translational cleavage and polyadenyla- sequence gene in the presence of oligonucleotides that code 

tion. Examples of transcription termmator and polyadenly- for the variable residues at the variant positions (bias 

tion signals include those derived form SV40. oligonucleotides). The addition of the oligonucleotides will 

In a preferred embodiment, when combinations of vari- create a bias favoring the incorporation of the variations in 

able positions are to be made, the nucleic acids encoding the 20 the secondary library Alternatively, only oligonucleotides 

GPA proteins are made using a variety of combinatorial for certain variations may be used to bias the library, 

techniques. For example, "shuffling" techniques such as are In a preferred embodiment, error-prone PCR in combi- 

outlined in U.S. Pat. Nos. 5,811,238; 5,605,721 and 5,830, nation with the overlapping oligonucleotide method outlined 

721, and related patents, all of which are hereby expressly in FIG. 12 is done. 

incorporated by reference. 25 In a preferred embodiment, gene shuffling with error 

In a preferred embodiment, multiple PCR reactions with prone PCR can be performed on the gene for the optimal 

pooled oligonucleotides is done, as is generally depicted in sequence, in the presence of bias oligonucleotides, to create 

FIG. 12. In this embodiment, overlapping oligonucleotides a DNA sequence library that reflects the proportion of the 

are synthesized which correspond to the full length gene. variations. The choice of the bias oligonucleotides can be 

Again, these oligonucleotides may represent all of the dif- 30 done in a variety of ways; they can chosen on the basis of 

ferent amino acids at each variant position or subsets. their frequency, i.e. oligonucleotides encoding high varia- 

In a preferred embodiment, these oligonucleotides are tion frequency positions can be used; alternatively, oligo- 

pooled in equal proportions and multiple PCR reactions are nucleotides containing the most variable positions can be 

performed to create full length sequences containing the used, such that the diversity is increased; if the GPA protein 

combinations of variable positions. 35 set is ranked, some number of top scoring positions can be 

In a preferred embodiment, the different oligonucleotides used to generate bias oligonucleotides; random positions 

are added in relative amounts corresponding to a probability may be chosen; a few top scoring and a few low scoring ones 

distribution table; that is, as shown in FIGS. 3-10, different may be chosen; etc. What is important is to generate new 

amino acids have different probabalistic chances of being at sequences based on preferred variable positions and 

a particular position. ITius, for example, as shown in FIG. 4, 40 sequences. Similarly, a top set of GPA proteins may be 

out of the top 1000 sequences, position 103 has valine 35% "shuffled" using traditional shuffling methods or the over- 

of the time, leucine 26% of the time, and isoleucine 31% of lapping oligonucleotide methods of FIG. 12. 

the time. The multiple PCR reactions thus result in full The methods of introducing exogenous nucleic acid into 

length sequences with the desired combinations of variable mammalian hosts, as well as other hosts, is well known in 

amino acids in the desired proportions. 45 the art, and will vary with the host cell used. Techniques 

The total number of oligonucleotides needed is a function include dextran-mediated transfection, calcium phosphate 

of the number of positions being mutated and the number of precipitation, polybrene mediated transfection, protoplast 

mutations being considered at these positions: fusion, electroporation, viral infection, encapsulation of the 

, . f ,. f . . ... ^ polynucleotide(s) in liposomes, and direct microinjection of 

(number of ohgos foi oonslant posiUons)+M]+M2+M3+. . . Vixfa • * i • a .t- j i_ ■ 1 , 

MnKtotal number of oligos required) ^° ^"^o "^^^^i- ^ outlmed herem, a particularly 

preferred method utilizes retroviral infection, as outlined in 

where Mn is the number of amino acids considered at PCT US97/01019, incorporated by reference, 

position n in the sequence. As will be appreciated by those in the art, the type of 

In a preferred embodiment, each overlapping oligonucle- mammalian cells used in the present invention can vary 

otide comprises only one position to be varied; in alternate 55 widely Basically, any mammahan cells may be used, with 

embodiments, the variant positions are too close together to mouse, rat, primate and human cells being particulariy 

allow this and multiple variants per oligonucleotide are used preferred, although as wiU be appreciated by those in the art, 

to allow complete recombination of all the possibilities. That modifications of the system by pseudotyping allows all 

is, each oligo can contain the codon for a single position cukaryotic cells to be used, preferably higher eukaryotes. As 

being varied, or for more than one position being varied. The 60 is more fully described below, a screen will be set up such 

multiple positions being varied must be close in sequence to that the cells exhibit a selectable phenotype in the presence 

prevent the oligo length from being impractical. For mul- of a bioactive peptide. As is more fully described below, cell 

tiple variable positions on an oligonucleotide, particular types implicated in a wide variety of disease conditions are 

combinations of variable residues can be included or particularly useful, so long as a suitable screen may be 

excluded in the library by including or excluding the oligo- 65 designed to allow the selection of cells that exhibit an altered 

nucleotide encoding that combination. The total number of phenotype as a consequence of the presence of a peptide 

oligonucleotides required increases when multiple variable within the cell. 
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Accordingly, suitable cell types include, but are not 
limited to, tumor cells of all types (particularly melanoma, 
myeloid leukemia, carcinomas of the lung, breast, ovaries, 
colon, kidney, prostate, pancreas and testes), 
cardiomyocytes, endothelial cells, epithelial cells, lympho- 5 
cytes (T-cell and B cell), mast cells, eosinophils, vascular 
intimal cells, hepatocytes, leukocytes including mono- 
nuclear leukocytes, stem cells such as haemopoetic, neural, 
skin, lung, kidney, liver and myocyte stem cells (for use in 
screening for differentiation and de-differentiation factors), lO 
osteoclasts, chondrocytes and other connective tissue cells, 
keratinocytes, melanocytes, liver cells, kidney cells, and 
adipocytes. Suitable cells also include known research cells, 
including, but not limited to, Jurkat T cells, NIH3T3 cells, 
CHO, Cos, etc. See the AJ'CC cell line catalog, hereby 15 
expressly incorporated by reference. 

In one embodiment, the cells may be additionally geneti- 
cally engineered, that is, contain exogeneous nucleic acid 
other than the GPA nucleic acid. 

In a preferred embodiment, the GPA proteins are 20 
expressed in bacterial systems. Bacterial expression systems 
are well known in the art. 

A suitable bacterial promoter is any nucleic acid sequence 
capable of binding bacterial RNA polymerase and initiating 
the downstream (3') transcription of the coding sequence of 25 
the GPA protein into mRNA. A bacterial promoter has a 
transcription initiation region which is usually placed proxi- 
mal to the 5' end of the coding sequence. This transcription 
initiation region typically includes an RNA polymerase 
binding site and a transcription initiation site. Sequences 30 
encoding metabolic pathway enzymes provide particularly 
useful promoter sequences. Examples include promoter 
sequences derived from sugar metabolizing enzymes, such 
as galactose, lactose and maltose, and sequences derived 
from biosynthetic enzymes such as tryptophan. Promoters 35 
from bacteriophage may also be used and are known in the 
art. In addition, synthetic promoters and hybrid promoters 
are also useful; for example, the tac promoter is a hybrid of 
the trp and lac promoter sequences. Furthermore, a bacterial 
promoter can include naturally occurring promoters of non- 40 
bacterial origin that have the ability to bind bacterial RNA 
polymerase and initiate transcription. 

In addition to a functioning promoter sequence, an efiS- 
cient ribosome binding site is desirable. In E. coli, the 
ribosome binding site is called the Shine-Delgarno (SD) 45 
sequence and includes an initiation codon and a sequence 
3-9 nucleotides in length located 3-11 nucleotides upstream 
of the initiation codon. 

The expression vector may also include a signal peptide 
sequence that provides for secretion of the GPA protein in 50 
bacteria. The signal sequence typically encodes a signal 
peptide comprised of hydrophobic amino acids which direct 
the secretion of the protein from the cell, as is well known 
in the art. 1Tie protein is either secreted into the growth 
media (gram-positive bacteria) or into the periplasmic space, 55 
located between the inner and outer membrane of the cell 
(gram-negative bacteria). 

The bacterial expression vector may also include a select- 
able marker gene to allow for the selection of bacterial 
strains that have been transformed. Suitable selection genes 60 
include genes which render the bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, 
kanamycin, neomycin and tetracycline. Selectable markers 
also include biosynthetic genes, such as those in the histdine, 
tryptophan and leucine biosynthetic pathways. 65 

'Iliese compouents are assembled into expression vectors. 
Expression vectors for bacteria are well known in the art, 
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and include vectors for Bacillus subtilis, E. coli, Strepto- 
coccus cremoris, and Streptococcus lividanSy among others. 

The bacterial expression vectors are transformed into 
bacterial host cells using techniques well known in the art, 
such as calcium chloride treatment, electroporation, and 
others. 

In one embodiment, GPA proteins are produced in insect 
cells. Expression vectors for the transformation of insect 
cells, and in particular, baculovirus-based expression 
vectors, are well known in the art. 

In a preferred embodiment, GPA protein is produced in 
yeast cells. Yeast expression systems are well known in the 
art, and include expression vectors for Saccharomyces 
cerevisiae, Candida albicans and C. maltosa, Hansenula 
polymorpha, Kluyveromyces fragilis and K. lactis, Pichia 
guillerimondii and R pastoris, Schizosaccharomyces pombe, 
and Yarrowia lipolytica. Preferred promoter sequences for 
expression in yeast include the inducible GAL1,10 
promoter, the promoters from alcohol dehydrogenase, 
enolase, glucokinase, gIucose6-phosphate isomerase, 
glyceraldehyde-3-phosphate-dehydrogenase, hexokinase, 
phosphofructokinase, 3-phosphoglycerate mutase, pyruvate 
kinase, and the acid phosphatase gene. Yeast selectable 
markers include ADE2, HIS4, LEU2, TRPl, and ALG7, 
which confers resistance to tunicamycin; the neomycin 
phosphotransferase gene, which confers resistance to G418; 
and the CUPl gene, which allows yeast to grow in the 
presence of copper ions. 

In addition, the GPApolypeptidesof the invention may be 
further fused to other proteins, if desired, for example to 
increase expression. 

In one embodiment, the GPA nucleic acids, proteins and 
antibodies of the invention are labeled with a label other than 
the scaffold. By "labeled" herein is meant thai a compound 
has at least one element, isotope or chemical compound 
attached to enable the detection of the compound. In general, 
labels fall into three classes: a) isotopic labels, which may be 
radioactive or heavy isotopes; b) immune labels, which may 
be antibodies or antigens; and c) colored or fluorescent dyes. 
ITie labels may be incorporated into the compound at any 
position. 

Once made, the GPA proteins may be covalently modi- 
fied. One type of covalent modification includes reacting 
targeted amino acid residues of an GPA polypeptide with an 
organic derivatizing agent that is capable of reacting with 
selected side chains or the N-or C-terminal residues of an 
GPA polypeptide. Derivatzation with bifunctional agents is 
useful, for instance, for crosslinking GPA to a water- 
insoluble support matrix or surface for use in the method for 
purifying anti-GPA antibodies or screening assays, as is 
more fiilly described below. Commonly used crosslinking 
agents include, e.g., l,l-bis(diazo-acetyl)-2-phenylethanc, 
glutaraldehyde, N-hydroxysuccinimide esters, for example, 
esters with 4-azidosalicylic acid, homobifunctional 
imidoesters, including disuccinimidyl esters such as 3,3'- 
dithiobis-(succinimidylpropionate), bifunctional maleim- 
idessuch asbis-N-maleimido-l,8-octane and agents such as 
methyl-3-[(p-azidophenyl)dithio]propioimidate. 

Other modifications include deamidation of glutaminyl 
and asparaginyl residues to the corresponding glutamyl and 
aspartyl residues, respectively, hydroxylabon of proline and 
lysine, phosphorylation of hydroxyl groups of seryl or 
threonyl residues, methylation of the "-amino groups of 
lysine, arginine, and histidine side chains [T. E. Creighton, 
Proteins: Structure and Molecular Properties, W. H. Free- 
man & Co., San Francisco, pp. 79-86 (1983)], acetylalion of 
the N-terminal amine, and amidabon of any C-terminal 
carboxyl group. 
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Another type of covalent modification of the CPA (1985)]; and the Herpes Simplex virus glycoprotein D (gD) 

polypeptide mcluded within the scope of this invention tag and its antibody [Paborsky et al.. Protein Engineering 

comprises altering the native glycosylabon pattern of the 3(6):547-553 (1990)]. Other tag polypeptides include thJ 

polypeptide. "Altering the native glycosylation pattern" is Flag-peptide [Hopp et al, BioTechnoloqy, 6:1204-1210 
mtended for purposes herein to mean deleting one or more s (1988)]; the KT3 epitope peptide [Martin et al., Science 

carbohydrate moieties found in native sequence GPA 255:192-194 (1992)]; tubulin epitope peptide [Skinner et 

polypeptide, and/or adding one or more glycosylabon sites al, 7. BioL Chem., 266:15163-15166 (1991)]; and the T7 

that are not present in the native sequence GPA polypeptide. gene 10 protein peptide tag [(Lutz-Freyermuth et al., Proc. 

Addition of glycosylation sites to GPA polypeptides may Natl Acad Sci. USA, 87:6393-6397 (1990)]. 
be accomplished by altering the amino acid sequence lo In a preferred embodiment, the GPA protein is purified or 

thereof. The alteration may be made, for example, by the isolated after expression. GPA proteins may be isolated or 

addition of, or substitution by, one or more serine or threo- purified in a variety of ways known to those skilled in the art 

nine residues to the native sequence GPA polypeptide (for depending on what other components are present in the 

O-hnked glycosylation sites). ^GPAamino acid sequence sample. Standard purification methods include 
may optionally be altered through changes at the DNA level, 15 electrophoretic, molecular, immunological and chromato- 

particularly by mutating the DNA encoding the GPA graphic techniques, including ion exchange, hydrophobic, 

polypeptide at preselected bases such that codons are gen- affinity, and reverse-phase HPLC chromatography, and chro- 

erated that wiU translate into the desired amino acids. matofocusing. For example, the GPA protein may be purified 

Another means of increasing the number of carbohydrate using a standard anti-library antibody column. Ultrafiltrabon 
moieties on the GPA polypeptide is by chemical or enzy- 20 and diafiltration techniques, in conjunction with protein 

matic coupling ofglycosides to the polypeptide. Such meth- concentration, are also useful. For general guidance in 

ods are described in the art, e.g., in WO 87/05330 pubUshed suitable purification techniques, see Scopes, R., Protein 

Sep. 11, 1987, and in Aplin and Wriston, CRC Crit. Rev. Purification, Springer- Verlag, NY (1982). The degree of 

Biochem., pp. 259-306 (1981). purification necessary will vary depending on the use of the 

Removal of carbohydrate moieties present on the GPA 25 GPA protein. In some instances no purification will be 

polypeptide may be accomplished chemically or enzymati- necessary. A preferred method for purification is outlined in 

cally or by mutational substitution of codons encoding for the examples. 

amino acid residues that serve as targets for glycosylation. Once made, the GPA proteins and nucleic acids of the 

Chemical deglycosylation techniques are known in the art invention find use in a number of applications, 
and described, for instance, by Hakimuddin, et al.. Arch. 30 In a preferred embodiment, the GPA proteins are admin- 

Biochem. Biophys., 259:52 (1987) and by Edge et ^UAnal. istered to a patent to treat a G-CSF-associated disorder. 

B/ocAem., 118:131(1981). Enzymatic cleavage of carbohy- By "G-CSF associated disorder" or "neutropenic" or 

drate moieties on polypeptides can be achieved by the use of "G-CSF responsive disorder'* or "condition" herein is meant 

a variety of endo-and exo-glycosidases as described by a disorder that can be ameliorated by the administration of 
Thotakura et al., Meth, Enzymol,, 138:350 (1987). 35 a compound with a GPA protein, including, but not Umited 

Another type of covalent modification of GPA comprises to, neutropenia associated with cancer therapies including 

linking the GPA polypeptide to one of a variety of nonpro- chemotherapy and radiation therapy; radiation accidents; 

teinaceous polymers, e.g., polyethylene glycol, polypropy- bone marrow transplantation; bone marrow suppression 

lene glycol, or polyoxyalkylenes, in the manner set forth in conditions, for example those associated with AIDS; myelo- 
U.S. Pat, Nos. 4,640,835; 4,496,689; 4,301,144; 4,670,417; 40 dysplastc syndromes characterized by granulocyte func- 

4,791,192 or 4,179,337. tional abnormalities; severe infections; etc. In addition, 

GPA polypeptides of the present invention may also be treatment with the GPA proteins of the invention can be used 
modified in a way to form chimeric molecules comprising an to enhance peripheral blood progenitor cell collection. 
GPApolypeptide fused to another, heterologous polypeptide In a preferred embodiment, a therapeutically effective 
or amino acid sequence. In one embodiment, such a chi- 45 dose of a GPA protein is administered to a patient. By 
meric molecule comprises a fusion of an GPA polypeptide "therapeutcally effective dose" herein is meant a dose that 
with a tag polypeptide which provides an epitope to which produces the effects for which it is administered. The exact 
an anti-tag antibody can selectively bind. The epitope tag is dose will depend on the purpose of the treatment, and will 
generally placed at the amino-or carboxyl-terminus of the be ascertainable by one skilled in the art using known 
GPA polypeptide. The presence of such epitope-tagged 50 techniques. In a preferred embodiment, dosages of about 5 
forms of an GPA polypeptide can be detected using an /ig/kg are used, administered either intraveneously or sub- 
antibody against the tag polypeptide. Also, provision of the cutaneously. As is known in the art, adjustments for GPA 
epitope lag enables the GPA polypeptide lo be readily degradation, systemic versus localized delivery, and rate of 
purified by affinity purification using an anti-lag antibody or new protease synthesis, as well as the age, body weight, 
another type of affinity matrix that binds lo the epitope tag, 55 general health, sex, diet, time of administration, drug inter- 
In an alternative embodiment, the chimeric molecule may action and the severity of the condition may be necessary, 
comprise a fusion of an GPApolypeptide with an immuno- and will be ascertainable with routine experimentation by 
globulin or a particular region of an immunoglobulin. For a those skilled in the art. 

bivalent fonm of the chimeric molecule, such a fusion could A "patient"- for the purposes of the present invention 

be to the Fc region of an IgG molecule, 60 includes both humans and other animals, particulariy 

Various tag polypeptides and their respective antibodies mammals, and organisms. Thus the methods are applicable 

are well known in the art. Examples include poly-hislidine lo both human therapy and veterinary applications. In the 

(poly-his) or poly-histidine-glycine (poly-his-gly) tags; the preferred embodiment the patient is a mammal, and in the 

flu HA tag polypeptide and its antibody 12CA5 [Field et al., most preferred embodiment the patient is human. 

MoL Cell. Biol., 8:2159-2165 (1988)]; the c-myc tag and the 65 The administration of the GPA proteins of the present 

8F9, 3C7, 6E10, G4, B7 and 9E10 antibodies thereto [Evan invention can be done in a variety of ways, including, but not 

el a!.. Molecular and Cellular Biology, 5:3610-3616 limited lo, orally, subcutaneously, intravenously. 
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intranasally, transdermally, intraperitoneally, 
intramuscularly, intrapulmonary, vaginally, rectally, or 
intraocularly. In some instances, for example, in the treat- 
ment of wounds and inflammation, the GPA protein may be 
directly applied as a solution or spray. 

The pharmaceutical compositions of the present invention 
comprise a GPA protein in a form suitable for administration 
to a patient. In the preferred embodiment, the pharmaceu- 
tical compositions are in a water soluble form, such as being 
present as pharmaceutically acceptable salts, which is meant 
to include both acid and base addition salts. "Pharma- 
ceutcally acceptable acid addition salt" refers to those salts 
that retain the biological effectiveness of the free bases and 
that are not biologically or otherwise undesirable, formed 
with inorganic acids such as hydrochloric acid, hydrobromic 
acid, sulfuric acid, nitric acid, phosphoric acid and the like, 
and organic acids such as acetic acid, propionic acid, gly- 
colic acid, pyruvic acid, oxalic acid, maleic acid, malonic 
acid, succinic acid, fumaric acid, tartaric acid, citric acid, 
benzoic acid, cinnamic acid, mandelic acid, methanesulfonic 
acid, ethanesulfonic acid, p-tolucnesulfonic acid, salicylic 
acid and the like. "Pharmaceutcally acceptable base addition 
salts" include those derived from inorganic bases such as 
sodium, potassium, lithium, ammonium, calcium, 
magnesium, iron, zinc, copper, manganese, aluminum salts 
and the like. 

Particularly preferred are the ammonium, potassium, 
sodium, calcium, and magnesium salts. Salts derived from 
pharmaceutically acceptable organic non-toxic bases 
include salts of primary, secondary, and tertiary amines, 
substituted amines including naturally occurring substituted 
amines, cyclic amines and basic ion exchange resins, such as 
isopropylamine, trimethylamine, diethylamine, 
triethylamine, tripropylamine, and ethanolamine. 

The pharmaceutical compositions may also include one or 
more of the following: carrier proteins such as serum 
albumin; buffers such as NaOAc; fillers such as microcrys- 
talline cellulose, lactose, corn and other starches; binding 
agents; sweeteners and other flavoring agents; coloring 
agents; and polyethylene glycol. Additives are well known 
in the art, and are used in a variety of formulations. 

In a preferred embodiment, GPA proteins are adminis- 
tered as therapeutic agents, and can be formulated as out- 
lined above. Similarly, GPA genes (including both the full- 
length sequence, partial sequences, or regulatory sequences 
of the GPA coding regions) can be administered in gene 
therapy applications, as is known in the art. These GPA 
genes can include antisense applications, either as gene 
therapy (i.e. for incorporation into the genome) or as anti- 
sense compositions, as will be appreciated by those in the 
art. 

In a preferred embodiment, the nucleic acid encoding the 
GPA proteins may also be used in gene therapy. In gene 
therapy applications, genes are introduced into cells in order 
to achieve in vivo synthesis of a therapeutically effective 
genetic product, for example for replacement of a defective 
gene. "Gene therapy*' includes both conventional gene 
therapy where a lasting effect is achieved by a single 
treatment, and the administration of gene therapeutic agents, 
which involves the one time or repeated administration of a 
therapeutically effective DNA or mRNA. Antisense RNAs 
and DNAs can be used as therapeutic agents for blocking the 
expression of certain genes in vivo. It has already been 
shown that short antisense oligonucleotides can be imported 
into cells where they act as inhibitors, despite their low 
intracellular concentrations caused by their restricted uptake 
by the cell membrane. (Zamecnik et al., Proc. Nail Acad. 
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ScL USA., 83:4143-4146 [1986]). The oligonucleotides 
can be modified to enhance their uptake, e.g. by substituting 
their negatively charged phosphodiester groups by 
uncharged groups. 

5 There are a variety of techniques available for introducing 
nucleic acids into viable cells. The techniques vary depend- 
ing upon whether the nucleic acid is transferred into cultured 
cells in vitro, or in vivo in the cells of the intended host. 
Techniques suitable for the transfer of nucleic acid into 

10 mammalian cells in vitro include the use of liposomes, 
electroporation, microinjection, cell fusion, DEAE-dextran, 
the calcium phosphate precipitation method, etc. The cur- 
rently preferred in vivo gene transfer techniques include 
transfecton with viral (typically retroviral) vectors and viral 

15 coat protein-liposome mediated transfection [Dzau et al.. 
Trends in Biotechnology, 11:205-210 (1993)]. In some 
situations it is desirable to provide the nucleic acid source 
with an agent that targets the target cells, such as an antibody 
specific for a cell surface membrane protein or the target 

20 cell, a ligand for a receptor on the target cell, etc. Where 
liposomes are employed, proteins which bind to a cell 
surface membrane protein associated with endocytosis may 
be used for targeting and/or to facilitate uptake, e.g. capsid 
proteins or fragments thereof tropic for a particular cell type, 

25 antibodies for proteins which undergo internalization in 
cycling, proteins that target intracellular localization and 
enhance intracellular half-life. The technique of receptor- 
mediated endocytosis is described, for example, by Wu et 
al., /, Biol. Chem., 262:4429^432 (1987); and Wagner et 

30 al., Proc. Natl Acad, ScL U.SA., 87:3410-3414 (1990). For 
review of gene marking and gene therapy protocols see 
Anderson et al.. Science, 256:808-813 (1992). 

In a preferred embodiment, GPA genes are administered 
as DNA vaccines, either single genes or combinations of 

35 GPA genes. Naked DNA vaccines are generally known in the 
art. Brower, Nature Biotechnology, 16:1304-1305 (1998). 
Methods for the use of genes as DNA vaccines are well 
known to one of ordinary skill in the art, and include placing 
a GPA gene or portion of a GPA gene under the control of 

40 a promoter for expression in a GPA patient. The GPA gene 
used for DNA vaccines can encode full-length GPA proteins, 
but more preferably encodes portions of the GPA proteins 
including peptides derived from the GPA protein. In a 
preferred embodiment a patient is immtinized with a DNA 

45 vaccine comprising a plurality of nucleotide sequences 
derived from a GPA gene. Similarly, it is possible to immu- 
nize a patient with a plurality of GPA genes or portions 
thereof as defined herein. Without being bound by theory, 
expression of the polypeptide encoded by the DNA vaccine, 

50 cytotoxic T-cells, helper T-cells and antibodies are induced 
which recognize and destroy or eliminate cells expressing 
GPA proteins. 

In a preferred embodiment, the DNA vaccines include a 
gene encoding an adjuvant molecule with the DNA vaccine. 
55 Such adjuvant molecules include cytokines that increase the 
immunogenic response to the GPA polypeptide encoded by 
the DNA vaccine. Additional or alternative adjuvants are 
known to those of ordinary skill in the art and find use in the 
invention. 

60 The following examples serve to more fully describe the 
manner of using the above-described invention, as well as to 
set forth the best modes contemplated for carrying out 
various aspects of the invention. It is understood that these 
examples in no way serve to limit the true scope of this 

65 invention, but rather are presented for illustrative purposes. 
All references cited herein are incorporated by reference in 
their entirety. 
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EXAMPLES entire length of the helices, while core4 and core4v*s vari- 

Design and Characterization of Novel CPA Proteins ^^^^ positions were selected from the interior (not at the 

Protein Design ^^^s) helices. Only hydrophobic amino acids were 
Summary: Sequences for novel granulopoietic proteins allowed at the variable core positions. These included Ala. 

(GPA proteins) were designed by simultaneously optimizing ^ Phe, He, Leu, Tyr and Trp. Gly was also allowed for the 
residues in the buried core of the protein using Protein variable positions that had Gly in the bovine wild type 
Design Automation (PDA) as described in WO98/47089 and stmcture (positions 28, 149, 150, and 167). Met and Pro 
U.S. Ser. No. 09/127,926, both of which are expressly were not allowed. 

incorporated by reference in their entirety. Several core Two boundary designs were also done; bndry4_2 had 24 

designs were completed, with 25-34 residues considered, lO variable boundary residues, and bndry4_AD had 14 (see 

corresponding to 10^''-10^^ sequence possibilities. Residues FIG. 2), The bndry4_AD design was restricted to boundary 

unexposed to solvent were designed in order to minimize residues on the outer two helices (A and D) since initial 

changes to the molecular surface and to limit the potential calculations suggested that the most pronounced changes in 

for antgenicity of designed novel protein analogues. Calcu- helical propensity result from modifications at these 

lations required from 12-24 hours on 16 Silicon Graphics 15 locations, and we anticipated that improvements in helical 

RIOOOO CPU's. The global optimum sequence from each propensity might lead to improved stability. Two additional 

design was selected for characterization. From 10-14 resi- boundary designs were done (bndry4_2_core4 and 

dues were changed from hG-CSF in the designed proteins, bndry4_AD_core4) which allowed the same boundary 

out of 174 residues total. Additional designs were done positions to vary but used the optimal sequence from the 

where 14-24 boundary positions were optimized resulting in 20 core4 design as the template. That is, these designs were 

12-20 mutated residues. These designs were repeated using required to keep the 10 core mutations (amino acid and 

the optimal sequence obtained from one of the core designs conformation) that resulted from the core4 PDA calculations 

as the template stmcture, again producing optimal sequences (see FIG. 3). The boundary designs allowed the following 

with from 12-20 mutations. Only the global optimum amino acids at the variable positions: Ala, Val, Leu, He, Asp, 

sequences were selected for experimental study because of 25 Asn, Glu, Gin, Lys, Ser, llir, and Hsp (a protonated His), 

the high stringency of PDA and the very low false positive Met, Pro, Cys, Gly, Arg, and the aroma tics Trp, Tyr, and Phe 

rate. were not allowed. 

Computational Protocols PDA Calculations 

Template structure preparation: The template structure The PDA calculations for all the designs were run using 
was produced using homology modeling. The aystal struc- 30 the a2hlp0 rotamer library. This hbrary is based on the 
ture of bovine G-CSF (PDB record 1 bgc) was used as the backbone-dependent rotamer library of Dunbrack and Kar- 
starting point for modeling since the crystal structure of plus (Dunbrack and Karplus, 1993) but includes more rota- 
human G-CSF is at lower resolution and is missing key mers for the aromatic and hydrophobic amino acids; and 
fragments including a restraining disulfide bond between X2 angle values of rotamers for all the aromatic amino acids 
positions 64 and 74. Bovine G-CSF also serves as a good 35 and X^ angle values for all the other hydrophobic amino 
model for human G-CSF since the sequences are the same acids were expanded ±1 standard deviation about the mean 
length and 142 out of 174 amino acids are identical (81%). value reported in the Dunbrack and Karplus library. Typical 
The 32 residues that differ in the bovine sequence were PDA parameters were used: the van der Waals scale factor 
replaced with the human residues for those positions and the was set to 0.9, the H-bond potential well-depth was set to 8.0 
conformations of the replaced side chains were optimized 40 kcal/mol, the solvation potential was calculated using type 2 
using PDA. The optimization was initially done on all the solvation with a nonpolar burial energy of 0.048 kcal/mol 
replaced residues except position 167; typical PDA param- and a nonpolar exposure multiplication factor of 1.6, and the 
eters were used (the van der Waals scale factor was set to secondary stmcture scale factor was set to 0.0 (secondary 
0.9, the H-bond potential well-depth was set to 8.0 kcal/mol, stmcture propensities were not considered). Calculations 
and the solvation potential was calculated using type 2 45 required from 12-24 hours on 16 Silicon Graphics RIOOOO 
solvation with a nonpolar burial energy of 0.048 kcal/mol CPU's, 
and a nonpolar exposure multiplication factor of 1.6). For Optimal Sequences 

position 167, the Gly in bovine G-CSF was replaced with the The optimal sequence selected by PDA for each of the 

human residue for this position (Val). designs is shown in FIG. 3. In the core designs, from 10 to 

However, due to steric constraints between position 167 50 14 residues were changed compared to wild type, while the 

and the disulfide bond between positions 64 and 74, the Val boundary designs produced 20 mutations for bndry4_2 (all 

at this position was optimized using less restrictive steric four helices designed) and 12 mutations for bndry4_AD 

constraints (PDA was run using a van der Waals scale factor (only A and D helices designed). Including the core4 

of 0.7 invStead of the typical value of 0.9). The entire mutants in the template resulted in the same number of 

structure was then minimized for 50 steps using conjugate 55 boundary mutations (20 for bndry4_core4; 12 for bndry4_ 

gradient minimization and the Dreiding II force field. This AD_core4), but different amino acids were selected at some 

minimized structure was used as the template for all the of the mutated positions, 

designs. Monte Carlo Analysis 

Design strategies: Core residues were selected for design Monte Carlo analysis of the sequences produced by PDA 

since optimization of these positions can improve stability, 60 shows the ground state (optimal) amino acid and amino 

although stabilization has been obtained from modifications acids allowed for each variable position and their frequen- 

at other sites as well. Core designs also minimize changes to cies of occurrence (see FIGS. 4 through 10). 

the molecular surface and thus limit the designed protein's Cloning and Expression 

potential for antigenicity. PDA calculations were run on Summary: A gene for met hG-CSF was synthesized from 

three core designs; core3 had 34 core positions that were 65 partially overlapping oligonucleotides (approximately 100 

aUowed to vary, core4 had 26, and cor64v had 25 (see FIG. bases) that were extended and PCR amplified; see FIG. IB. 

2). The core3 variable positions were selected from the Codon usage was optimized for E. coli and several restric- 
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tioD sites were incorporated to ease future cloning. These around the expected elution time for each protein was 
partial genes were cloned into a vector and transformed into collected and the bufifer was exchanged into lOmM NaOAc 
£. co// for sequencing. Several of these gene fragments were at pH 4 for biophysical characterization. For long term 
then cloned into adjacent positions in an expression vector storage, a buffer of 5% sorbitol, 0.004% Tween 80, and 10 
(pET17 or pET21) to form the full length gene for met 5 mM NaO Ac at pH 4 was used. The proteins were >98% pure 
hG-CSF (528 bases) and transformed into E coli for expres- as judged by reversed phase HPLC on a C4 column (3.9 
sion. Protein was expressed in E, coli in insoluble inclusion mmxl50 mm) with linear acetonitrile-water gradient con- 
bodies (data not shown) and its identity was confinned by taining 0.1% TFE. 

immunoblot of SDS-PAGE using a commercial Mab against Isolation and refolding from inclusion bodies: To isolate 

hG-CSF. A similar strategy was followed for all of the novel lo inclusion bodies, the £. coli cells were pelleted by centrifu- 

GPA proteins and all were expressed (data not shown). gation at 8000 rpm in a Beckman J2-17 rotor. The cells were 

Cloning re-suspended in 50 mM Tris.HCl pH8.0, 10 mM MgCl^ at 

To clone the gene, pairs of partially complementary 5 mis per gram of pelleted cells. Lysozyme was added to a 

oligonucleotides were synthesized and annealed by heating final concentration of 0.1 mg/ml, and the cells were incu- 

to 70** C. for 10 min and cooling to room temperature. The 15 bated at 30° C. for 30 min. The cells were then rapidly frozen 

overlapping oligonucleotides (100 mers) were extended and thawed, and DNase 1 was added to a concentration of 

using Klenow fragment for 1 hour at 37° C. These extended 10 //g/ml. After incubation at 37° C. for 30 minutes, the 

oligonucleotides were then used as templates for PCR with inclusion bodies were isolated by centrifugation at 12 000 

primers complementary to the terminal 20 nucleotides of rpm for 30 min and washed twice with 50 mM Tris.HCl 

each end. PCR products were cloned into the vector pCR- 20 pHS.O, 10 mM MgCl2. 

Blunt (Invitrogen) according to the manufacturer's The protein precipitate was washed and fully solubilized 

recommendations, and transformed into Gibco-BRL Sub- in 2% sarkosyl, 50 mM Tris, pH 8.0, CuSO^ was then added 

cloning Efficiency E. coli DH5a cells. The DNAs from into the mixture to reach a concentration of 20 uM. The 

several colonies were isolated using a Qiagen Miniprep Spin mixture was stirred for 8-10 hours to refold the proteins by 

Kit, and sequenced by an Applied Biosystems 377XL auto- 25 forming disulfide bonds under air oxidation, 

mated flourescent DNA sequencer. Spectroscopic Characterization 

Expression Summary: Protein structure was assessed by circular 

To express the protein, sequenced genes were subcloned dichroism (CD). The CD spectra for met hG-CSF and the 

between the Ndel and Xhol sites of Novagen's pET21a (+) GPA proteins tested were nearly identical to each other and 

vector and transformed into £. coli BL21 (DE3) cells. 30 to published spectra of met hG-CSE These spectra indicate 

Protein expression was induced by growing the E. coli cells highly similar secondary structure and tertiary folds for the 

in Circlegrow media (Bio 101) with shaking at 37° C. to a GPA proteins and met hG-CSF. Thermal stability was 

density of 0.5 OD^jq. IPTG was then added to a final assessed by monitoring the temperature dependence of the 

concentration of 1 mM, and growth was allowed to continue CD signal at 222 nm, a wavelength diagnostic for helical 

for a further 3 hours. The expressed protein incorporated a 35 protein structure. The thermal stabilities of the proteins are 

Met at the N-terminus; our numbering begins with the next shown in FIG. 13, with core4 approximately 10° C. more 

residue, a Thr. stable than met hG-CSF and core3 and core4v having very 

To confirm expression of the protein, 10 //I samples were similar thermal stabilities to met hG-CSF. As in previously 

removed prior to addition of IFFG and at the end of the three published PDA designed proteins, the origin of the increased 

hour incubation. 'Iliese samples were electrophoresed 40 stability likely results from an improved balance between 

through a 15% SDS-polyacrylamide gel and stained with packing interactions and hydrophobic burial of side chains. 

Coomassie blue R-250. Expression of protein with the The thermal stabilities of three additional GPA proteins 

expected molecular weight could readily be observed. Con- (smO, fm3 and fm4) derived by reverting some of the core 

firmation that the protein was GCSF was obtained by mutant positions to wild type arc shown in FIG. 16. 

immunoblot analysis using monoclonal antibodies directed 45 Spectroscopic Characterization 

to either the N-terminal 20 amino acids or the C-terminal 18 The concentrations of the proteins were determined by 

amino acids (Santa Cruz Biotechnology). U V spectroscopy at 280 nm using the extinction coeflBcients 

Isolation and Purification shown in FIG. 16. CD spectra were measured on an Aviv 

Summary: Protein was isolated by solubilizing the inclu- 202DS spectrometer equipped with a Peltier temperature 

sion bodies in detergent and refolding the protein in the 50 control unit. The ellipticity was calibrated with (+)-10- 

presence of CUSO4 to promote formation of native disulfide camphorsulfonic acid. The thermal transition curves were 

bonds. The solubilized protein mixture was loaded onto a recorded at 222 nm in a buffer of 10 mM NaO Ac at pH 4.0 

size exclusion column to separate monomeric protein from every 2.5° C. with an averaging time of 5 s and an equili- 

aggregates and contaminants from the preparation. Fractions bration time of 3 min. ITie melting temperature (T^) value 

containing monomeric met hG-CSF were collected and 55 of each protein was derived from the derivative curve of the 

assessed for purity by reversed phase HPLC. Greater than ellipticity at 222 nm vs. temperature. The T^ values were 

95% purity was confirmed. The designed GPA proteins reproducible to within 1° for the same protein at the con- 

eluted slightly later than wildlype met hG-CSF. centraiions used (--0.1 mg/ml). Thermal denaturation curves 

HPLC purification: The mixture was directly loaded onto arc shown in FIG. 13. The T„'s for core4, core4v and core3 

the size exclusion column (10 mmx300 mm loaded with 60 and three proteins derived fi*om them (smO, fm4 and fm7) 

superdex prep 75 resin purchased from Pharmacia) and are shown in FIG. 16. 

eluted at a flow rate of 0.8 ml/mi n using the column buffer In Vitro Biological Activity 

(100 mM Na2S04, 50 mM Tris, pH 7,5). The peaks are Summary: FIG. 14 shows the dose response curves for 

monitered by UV detector at dual wavelengths of 214 and met hG-CSF and three GPA proteins. Mouse leukocytes 

280 nm. Albumin, carbonic an hydrate, cytochrome C and 65 were transfected with human G-CSF receptor, making leu- 

aprotinin were used to calibrate the molecular size of kocyle proliferation dependent on G-CSF signaling activity 

proteins versus elution time. The monomeric peak that elules via the G-CSF receptor. Leukocyte proliferation is measured 
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by incorporation of bronainated uracil (BrdU) measured by 
ELISA. CPA protein granulopoietic activity is measured by 
quantifying cell proliferation as a function of protein con- 
centration. Two hG-CSF samples were also tested, one 
produced as described herein and a commercially available 
hG-CSF from R&D Systems. Dose response curves were 
very similar for all of the proteins tested, except for core4, 
which showed approximately two times the potency of met 
hG-CSF. FIG. 15 shows the appearance of a typical 96 well 
plate ELISA of control samples with met hG-CSF. The 
statistical analysis of the dose response assay (8 replicates) 
shows that core4 was highly significantly more potent than 
the other GPA proteins and met hG-CSF. 'Ilie origin of this 
effect is unclear, and could be from increased affinity for the 
receptor, increased stability of core4 under cell culture assay 
conditions, or a combination. 

Cell culture: The cells used in the proliferation assay were 
Ba/F3 (murine lymphoid) cells stably transfected with the 
gene encoding the human Class 1 GCSF receptor (a kind gift 
from Dr. Belinda Avalos, Ohio State University). These cells 
were maintained in RPMl medium 1640 (Gibco-BRL) at 
5% COj, 37° C. in high humidity. They were passaged every 
2-3 days by a 1 in 10 dilution into fresh media. 

Ceil proliferation assay: Cell proliferation in response to 
GCSF was detected by 5-bromo-2'-deoxyuridine (BrdU) 
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incorporation quantified by a BrdU-specific ELISA kit as 
described by the manufacturer (Boehringer Mannheim). 
Briefly, 1x10^ to 1x10^ Ba/F3 cells/ml are incubated with 
varying amounts of GCSF (1x10^ pg/ml to 1x10^ pg/ml) for 

5 42 hrs before the addition of 10 BrdU. After further 
incubation of 22 hrs, the cells are lysed and the DNA 
denamred using FixDenat (Boehringer Mannheim). Incor- 
poration of BrdU into DNA was then quantified with an 
ELISA that utilizes a peroxidase-conjugated monoclonal 
antibody against BrdU. Peroxidase activity was measured at 
450 nm by a BioRad Model 550 microtitre plate reader. 
Typically, each experiment contained 8 replicates spread 
over 4 plates. Data was analyzed by Kaleidagraph (Synergy 

J J Software) and Statistica (Statsoft). 
Storage Stability 

The storage stability of core4 was assessed by incubation 
at both 37 and 50** C. under solution conditions identical in 
composition to that used in the commercial formulation of 

20 Neupogen. Accelerated degradation was followed by 
observing the disappearance of monomeric protein with size 
exclusion chromatography, since aggregation is the pre- 
dominant mechanism of inactivation of G-CSF. Even under 
optimized formulation conditions, core4 is significantly 
more stable than met hG-CSF (FIG. 15). 



SEQUENCE LISTING 

<160> NUMBER OF SEQ ID NOS i 18 

<210> SEQ ID NO 1 

<211> LENGTH: 526 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 1 

atgactccat taggtccagc ttcctctctg ccgcaaagct tcctgctgaa atgcctggaa 60 

caggttcgta aaatccaggg tgatggtgct gctctgcagg aaaaactgtg cgctacctac 120 

aaactgtgcc atccggaaga actggttctg ctgggtcact ccctgggtat cccgtgggcg 180 

ccgctgagct cctgcccgag ccaggctctg cagctggctg gttgcctgtc ccaattgcac 240 

agcggccttt tcctgtacca gggtctgctg caagctctgg aaggtactcc ccggaactgg 300 

gtccgaccct ggacactctg cagctggacg tcgctgactt cgctaccacc atctggcagc 360 

agatggaaga actgggtatg gctccggctc tgcagccgac ccagggtgct atgccggctt 420 

tcgttccgct ttccagcgtc gcgcaggtgg cgttctggtt gctagccacc tgcagagctt 4 80 

cctggangtt tcctaccgtg ttctgcgtca cctggctcag ccgtga 526 



<210> SEQ ID NO 2 

<:211> LENGTH: 174 

<212> TYPE: PRT 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 2 

Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Fhe Leu Leu Lys 
15 10 15 

Cys Leu Glu Gin Val Arg Lys lie Gin Gly Asp Gly Ala Ala Leu Gin 
20 25 30 



Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys Hie Pro Glu Glu Leu Val 
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Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser Cys 
50 55 60 

Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His Ser 
65 70 75 80 

Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly lie Ser 
85 90 95 

Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala Asp 
100 105 110 

Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala Pro 
115 120 125 

Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala Phe 
130 135 140 

Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser Phe 
145 150 155 160 

Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 



<210> SEQ ID NO 3 
<211> LENGTH: 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 

<220> FEATURE: 

<221> NAME/KEY: mat.peptide 

<2 22 > LOCATION: (2)..<) 

<400> SEQUENCE: 3 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe He Leu 
-11 5 10 15 

Lys Cys Leu Glu Leu Val Arg Lys He Gin Gly Glu Gly Ala Ala Leu 
20 25 30 

He Glu He Leu Cys Ala Lys Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu Leu 
65 70 75 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Lys Leu Glu Gly He 
SO 85 90 95 

Ser Pro Glu Val Gly Pro He Leu Asp Thr Leu He Leu Glu Val Ala 
100 105 110 

Asp Phe Ala Thr He He Trp Gin Leu Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Lys Glu Asp Gly Gly Val Leu Val Ala He Leu Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ala Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 4 
<211> LENGTH: 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 

<220> FEATUEIE: 

<221> NAME /KEY: mat.peptide 
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<222> LOCATION: (2).,() 
<400> SEQUENCE: 4 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe He Leu 
-11 5 10 15 

LyB Leu Leu Glu Leu Val Arg Lye He Gin Gly Glu Ala Ala Ala Leu 
20 25 30 

Leu Glu Glu Leu Cys Ala Hie Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly Hie Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Phe Leu 
65 70 75 

Ser Gly Leu Phe Leu Phe Gin Gly Leu Leu Gin Lys Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Lys Val Asp Thr Leu He Leu Glu He Ala 
100 105 UO 

Asp Leu Ala Thr He He Trp Gin Leu Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Lys Glu Asp Gly Gly He Leu He Ala He Leu Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ala Tyr Arg Val Phe Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 5 
<211> LENGTH: 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<220> FEATURE: 
<221> NAME/KEY: mat_peptide 
<222> LOCATION: (2) . .() 

<400> SEQUENCE: 5 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe He Leu 
-11 5 10 15 

Lys Cys Leu Glu Leu Val Arg Lye He Gin Gly Glu Gly Ala Ala Leu 
20 25 30 

He Glu Glu Leu Cys Ala His Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Lys Glu Thr Gly Gly Val Leu Val Ala He Leu Leu Gin Ser 
145 150 155 
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Phe Leu Glu Val Ala Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 6 
<211> LENGTH: 175 
<212> TYPE; PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<220> FEATURE: 
<221> NAME /KEY: mat.peptide 
<222> LOCATION : ( 2 ) . . { ) 

<400> SEQUENCE: 6 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe lie Leu 
-11 5 10 15 

Lys Leu Leu Glu Leu Val Arg Lys He Gin Gly Glu Ala Ala Ala Leu 
20 25 30 

Leu Glu Glu Leu Cys Ala His Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Phe His 
65 70 75 

Ser Gly Leu Phe Leu Phe Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Val Asp Thr Leu Gin Leu Asp He Ala 
100 105 110 

Asp Leu Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Lys Glu Asp Gly Gly He Leu He Ala He Leu Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ala Tyr Arg Val Phe Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 7 
<211> LENGTH: 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 

<220> FEATURE: 

<221> NAME/KEY: mat_peptide 

<2 2 2> LOCATION: (2)..() 

<400> SEQUENCE: 7 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
-11 5 10 15 

Lys Leu Leu Glu Gin Val Arg Lys He Gin Gly Asp Ala Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Phe His 
65 70 75 

Ser Gly Leu Phe Leu Phe Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
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BO 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Val Asp Thr Leu Gin Leu Asp lie Ala 
100 105 110 

Asp Leu Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly He Leu He Ala Ser His Leu Gin Ser . 
145 150 155 

Phe Leu Glu Val Ser Tyr Arg Val Phe Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 8 
<2U> LENGTH: 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<220> FEATURE: 
<221> NAME/KEY: mat_peptide 
<222> LOCATION: (2)..<) 

<400> SEQUENCE: 8 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
-11 5 10 15 

Lys Leu Leu Glu Gin He Arg Lys He Gin Gly Asp Ala Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Phe His 
65 70 75 

Ser Gly Leu Phe Leu Phe Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp He Ala 
100 105 110 

Asp Leu Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly He Leu He Ala Ser His He Gin Ser 
145 150 155 

Trp Phe Glu Val Ser Tyr Arg Ala Phe Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 9 

<211> LENGTH: 175 

<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 

<220> FEATURE: 

<221> NAME/KEY: raat_peptide 

<222> LOCATION: (2).. () 

<400> SEQUENCE: 9 



Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Fhe Leu Leu 
-11 5 10 15 
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Lys Leu Leu Glu Gin Val Arg Lye He Gin Gly ABp Ala Ala Ala Leu 
20 25 30 

Gin Glu Lys He Cys Ala Thr Tyr Lye Leu Cys Hie Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

eye Pro Ser Gin Ala Leu Gin Leu Ala Gly Cye Leu Ser Gin Phe His 
65 70 75 

Ser Gly Leu Phe Leu Phe Gin Gly Leu Phe Gin Ala Phe Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Leu Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly He Leu He Ala Ser His Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ser Tyr Arg Val Phe Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 10 
<211> LENGTH: 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<220> FEATURE: 
<221> NAME /KEY: mat_peptide 
<222> LOCATION: (2)..() 

<400> SEQUENCE: 10 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
-11 5 10 15 

Lys Ala Leu Glu Gin Val Arg Lye He Gin Gly Asp Ala Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO U 
<2H> LENGTH: 175 
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<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION; synthetic 
<220> FEATURE: 
<221> NAME/KEY: mat.peptide 
<222> LOCATION: (2)..() 

<400> SEQUENCE: H 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
-i 1 5 10 15 

Lys Ala Leu Glu Gin Val Arg Lys He Gin Gly Asp Ala Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly He Leu He Ala Ser His Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 12 
<211> LENGTH: 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<220> FEATURE: 
<221> NAME/KEY: mat_peptide 
<222> LOCATION: (2)..() 

<400> SEQUENCE: 12 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
-11 5 10 15 

Lys Leu Leu Glu Gin Val Arg Lys He Gin Gly Asp Ala Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly Hia Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Phe His 
65 70 75 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 



Asp Leu Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 
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Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser Hie Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ser Tyr Arg Val Phe Arg Hie Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 13 
<211> LENGTH: 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<220> FEATURE: 
<221> NAME /KEY: mat.peptide 
<222> LOCATION: (2) . . () 

<400> SEQUENCE: 13 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
-i 1 5 10 15 

Lys Leu Leu Glu Gin Val Arg Lye He Gin Gly Asp Ala Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Phe His 
65 70 75 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Leu Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly He Leu He Ala Ser His Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ser Tyr Arg Val Phe Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 14 
<211> LENGTH; 175 
<212> TYPE: PRT 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<220> FEATURE: 
<221> NAME/KEY: mat_peptide 
<222> LOCATION: {2)..() 

<400> SEQUENCE: 14 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
-11 5 10 15 

Lys Leu Leu Glu Gin Val Arg Lys He Gin Gly Asp Ala Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 



Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
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50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Phe His 
65 70 75 

Ser Gly Leu Phe Leu Phe Gin Gly Leu Leu Gin Ala Leu Glu Gly lie 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Aap Val Ala 
100 105 110 

Asp Leu Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly He Leu He Ala Ser His Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ser Tyr Arg Val Phe Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 15 

<211> LENGTH: 175 

<212> TYPE: PRT 

<213> ORGANISM: Homo sapiens 

<220> FEATURE: 

<221> NAME/KEY: mat_peptide 

<2 2 2> LOCATION: {2)..() 

<4 00> SEQUENCE: 15 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
-11 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lye He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cya Leu Ser Gin Leu His 
65 70 75 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
80 85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
160 165 170 



<210> SEQ ID NO 16 
<:211> LENGTH: 528 
<212> TYPE: DNA 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 16 



atgactccat taggtccagc ttcctctctg ccgcaaagct tcctgctgaa actgctggaa 6 0 
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caggttcgta aaatccaggg tgatgcagct gctctgcagg aaaaaatctg cgctacctac 120 

aaactgtgcc atccggaaga actggttctg ctgggtcact ccctgggtat cccgtgggcg 180 

ccgctgagct cctgcccgag ccaggctctg cagctggctg gttgcctgtc ccaattccac 240 

agcggccttt tcctgttcca gggtctgttc caggctttcg aaggtatctc cccggaactg 300 

ggtccgaccc tggncactct gcagctggac gtcgctgacc tggctaccac catctggcag 360 

cagatggaag aactgggtat ggctccggct ctgcagccga cccagggtgc tatgccggct 420 

ttcgcttccc ctttccagcg tcgcgcaggt ggcatcctga tcgctagcca cctgcagagc 480 

ttcctggaag tttcctaccg tgttttccgt cacctggctc agccgtga 528 

<210> SEQ ID NO 17 
<2H> LENGTH: 528 
<212> TYPE: DNA 

<213> ORGANISM: Artificial sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 17 

atgactccat taggtccagc ttcctctctg ccgcaaagct tcctgctgaa actgctggaa 60 

caggttcgta aaatccaggg tgatgcagct gctctgcagg aaaaactgtg cgctacctac 12 0 

aaactgtgcc atccggaaga actggttctg ctgggtcact ccctgggtat cccgtgggcg 180 

ccgctgagct cctgcccgag ccaggctctg cagctggctg gttgcctgtc ccaattccac 24 0 

agcggccttt tcctgttcca gggtctgctg caagctctgg aaggtatctc cccggaactg 300 

ggtccgaccg ttgacactct gcagctggac atcgctgacc tggctaccac catctggcag 36 0 

cagatggaag aactgggtat ggctccggct ctgcagccga cccagggtgc tatgccggct 420 

ttcgcttccg ctttccagcg tcgcgcaggt ggcatcctga tcgctagcca cctgcagagc 480 

ttcctggaag tttcctaccg tgttttccgt cacctggctc agccgtga 528 

<210> SEQ ID NO 18 
<211> LENGTH: 528 
<212> TYPE: DNA 

<213> ORGANISM: Artificial sequence 
220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 18 

atgactccat taggtccagc ttcctctctg ccgcaaagct tcctgctgaa actgctggaa 60 

cagatccgta aaatccaggg tgatgcagct gctctgcagg aaaaactgtg cgctacctac 120 

aaactgtgcc atccggaaga actggttctg ctgggtcact ccctgggtat cccgtgggcg 180 

ccgctgagct cctgcccgag ccaggctctg cagctggctg gttgcctgtc ccaattccac 240 

agcggccttt tcctgttcca gggtctgctg caagctctgg aaggtatctc cccggaactg 300 

ggtccgaccc tggacactct gcagctggac atcgctgacc tggctaccac catctggcag 360 

cagatggaag aactgggtat ggctccggct ctgcagccga cccagggtgc tatgccggct 420 

ttcgcttccg ctttccagcg tcgcgcaggt ggcatcctga tcgctagcca catccagagc 480 

tggttcgaag tttcctaccg tgctttccgt cacctggctc agccgtga 528 
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We claim: 

1. A non-naturally occurring GPA protein comprising at 
least five amino acid substitutions as compared to hG-CSF 
protein, wherein at least five of said substitutions are 
selected from the amino acid residues at positions selected 5 
from 14, 17, 20, 21, 24, 27, 28, 31, 32, 34, 35. 38, 78, 79, 
85, 89, 91, 92, 99, 102, 103, 107, 109, 110, 113, 116, 120, 
145, 146, 147, 148, 151. 153, 155, 156, 157, 160, 161, 163, 
164, 167, 168 and 170. 

2. A non-naturally occurring GPA protein according to 10 
claim 1 wherein said GPA protein has at least 10 amino acid 
substitutions. 

3. A non-naturally occurring GPA protein according to 
claim 2 wherein 10 of said substitutions are at positions 17, 
28, 78, 85, 103, 110, 113, 151, 153 and 168. 

4. A non-naturally occurring GPA protein according to 
claim 3 wherein said substitutions are 17L, 28A, 78F, 85F, 
103V, 1101, 113L, 1511, 1531 and 168F (SEQ ID NO: 7). 

5. A non-naturally occurring GPA protein according to 
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20. A non-naturally occurring GPA polypeptide compris- 
ing the amino acid sequence of SEQ ID NO: 11. 

21. A non-naturally occurring GPA protein comprising at 
least five amino acid substitutions as compared to hG-CSF 
protein, wherein at least five of said substitutions are 
selected firom the amino acid residues at positions selected 
from 17L, 171; 21V, 211: 24V, 241; 28A, 28L; 31V, 31L; 78F; 
85F, 85Y; 89L, 89F; 103V, 103L, 1031; llOV, llOL, 1101; 
113L; 1511; 153V, 1531; 157L, 1571; 160F, 160W; 161L, 
161F; and 168F. 

22. A non-naturally occurring GPA protein comprising at 
least five amino acid substitutions as compared to hG-CSF 
protein, wherein at least five of said substitutions are 

j5 selected from the group consisting of 17L, 17V, 171, 21V, 
211, 21F; 241, 24V; 28A, 28L; 31L, 31A, 31V, 311; 78F, 78V; 
82L, 82F; 85F, 85V, 851, 85Y; 89L, 89F, 89W; 103V, 103A, 
103L, 1031; 106L, 106V; 1101, llOV, llOL; 113L; 1511, 
1531, 153V; 157L, 157V, 1571; 160F, 160W; 161L, 161F; 



claim 1, wherein at least five of said substitutions are 20 l^^^J 168F. 
selected from the amino acid residues at positions selected A non-naturally occurring GPA protein comprising at 

from 14, 20, 27, 32, 34, 38, 79, 91, 102, 107, 109, 116, 120, ^^^^^ ^^id substitutions as compared to hG-CSF 

146, 147, 148. 155, 156 and 163. protein, wherein at least five of said substitutions are 

6. A GPA protein according to claim 5 wherein said selected from the group consisting of 17L, 17V, 171; 21V, 
substitutions are 141, 20L, 27E, 32L, 34E, 38H, 79L, 91K, 25 ^^^^ 211, 21F, 21 Y; 241, 24A, 24V, 24L; 28A, 28L; 31L, 
102K, 1071, 109E, 1161, 120L, 146K, 147E, 148D, 1551, ^1^; 351, 35V; 78F, 78A, 78y 78L, 781, 78Y; 82L, 82A, 
156L and 163A, (SEQ ID NO: 18). 82F; 85F, 85W; 89F, 89L, 89W; 92F; 103L; 103L, 1031; 

7. A recombinant nucleic acid encoding the non-naturally 106L, 106V; UOV, llOA, llOL, 1101; 113L, 113A, 113F; 
occurring GPA protein of claim 1. H^I, 117A, 117V, 117L, 117F, 117W; 1511, 1531; 157L, 

8. An expression vector comprising the recombinant 30 157V, 1571; 160F, 160W; 161L, 161A, 161V, 161F; 
nucleic acid of claim 7. 168F. 

9. A host cell comprising the expression vector of claim ^4. A non-naturally occurring GPA protein comprising at 

least five amino acid substitutions as compared to hG-CSF 
protein, wherein at least five of said substitutions are 

35 selected from the group consisting of 141; 20L; 27E; 321: 
34K, 341, 34F; 38V, 381, 38E, 38K; 79L; 91K; 99V, 99L 
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10. A host cell comprising the recombinant nucleic acid of 
claim 1. 

11. A method of producing a non-naturally occurring GPA 
protein comprising culturing the host cell of claim 10 under 
conditions suitable for expression of said nucleic acid. 

12. ITie method according to claim 11 further comprising 
recovering said GPA protein. 

13. A pharmaceutical composition comprising a GPA 
protein according to claim 1 and a pharmaceutical carrier 

14. A non-naturally occurring GPA protein according to 
claim 1, wherein at least five of said sut5stitutions comprises 



102L, 1021; 1071; 109E, 109V; 1161, 116L, 116K; 120L; 
145Q, 145E; 146K, 146Q; 147E; 148T, 148A, 148D; 1551 
156L; 164A; 170H, 170L, 170E, and 170Q. 
4Q 25. A non-naturally occurring OPA protein comprising at 
least five amino acid substitutions as compared to hG-CSF 
protein, wherein at least five of said substitutions are 
selected from the group consisting of 141, 14L; 20L; 27E, 
27S; 32L, 32V, 321; 34E, 34Q, 34K; 38H, 38V, 381, 38E, 
substitutions at positions selected from 14, 17, 20, 27, 28, 45 38K; 79L; 91K; 99L, 99E; 102K, 102T, 102V, 102L, 1021, 



32, 34, 35, 38, 78, 79, 85, 89, 9, 92, 102, 103, 107, 109, 110, 
113, 116, 120, 146, 147, 148, 151, 153, 155, 156, 164, 167, 
and 168. 

15. A non-naturally occurring GPA protein according to 
claim 14 wherein said substitutions are 141, 17L, 20L, 27E, 50 
28A, 32L, 34E, 38H, 78F, 79L, 85F, 91K, 102K, 103 V, 1071, 
109E, 1101, 113L. 1161, 120L, 146K, 147E, 1480, 1511, 
1531, 1551, 156L, 164A and 168F (SEQ ID NO: 4). 

16. A non-naturally occurring GPA protein according to 



102E, 102Q; 1071, 107V, 107L; 109E, 109V, 109D, 109Q; 
1161, 116V, n6L, 116E, 116K; 120L; 145Q, 145E; 146L, 
146Q; 147E, 147K; 148D, 148A, 148T; 1551; 156L; 164A; 
170?, 170D, 170L, 170E, 170Q, and 170K. 

26. A non-naturally occurring GPA protein comprising at 
least five amino acid substitutions as compared to hG-CSF 
protein, wherein at least five of said substitutions arc 
selected from the group consisting of 141, 14L; 20L; 27E; 
32T; 34E, 341, 34Q, 34K; 38V, 381, 38II, 38E, 38K; 145Q, 



claim 14 wherein said substitutions are 17L, 28A, 351, 78F, 55 145E; 146L, 146Q; 147E; 148T, 148A, 148D; 1551; 156L 



85F, 89F, 92F, 113L, 1511, 1531, and 168F (SEQ ID NO: 9). 

17. A non-naturally occurring GPA protein according to 
claim 14 wherein said substitutions are 17L, 211, 28A, 78F, 
85F, HOI, 113L, 1511, 1531, 1571, 160W, 161F, 167 A, and 
168F (SEQ ID NO: 8). 

18. A non-naturally occurring GPA protein according to 
claim 14 wherein said substitutions are 17L, 28 A, 78F, 85F, 
113L, 1511, 1531, and 168F (SEQ ID NO. 14). 

19. A non-naturally occurring GPA protein comprising the 
amino acid sequence of SEQ ID NO: 10. 



164A; 170H, 170D, 170L, 170E, 170Q, and 170K. 

27. A non-naturally occurring GPA protein comprising at 
least five amino acid substitutions as compared to hG-CSF 
protein, wherein at least five of said substitutions arc 
selected from the group consisting of 141, 14L; 20L; 27E 
32L, 32V, 321; 34E, 34Q, 34K; 38V, 381, 38H, 38E, 38K 
145Q, 145E; 146L, 146Q; 147E; 148D, 148A, 148T; 1551 
156L; 164S; 170H, 170L, 170E, and 170Q. 
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Chain-A: Sequence and Secondary Structure 

1 MSYNLLGFLQ RSSNFQCQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
HHHHHHHH HHHHHHHHHH HTTS SG GGGG HHHHH 

51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIVENLLAN VYHQINHLKT 
HHHHHHHHH HHHHHHHHHH TS GGGT HHHHHHHHHH HHHHHHHHHH 

101 VLEEKLEKED FTRGKLMSSL HLKRYYGRIL HYLKAKEYSH CAWTIVRVEI 
HHHHHHTTSS SSSHH. HHHHHHHHHH HHHHHTTT H HHHHHHHHHH 

151 LHNFYFINRL TGYLRN 
HHHHHHHHHH HTT 

FIG..1A 



Chain-B: Sequence and Secondary Structure 

1 MSYNLLGFLQ RSSNFQCQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
HHHHHHHH HHHHHHHHHH HHH S HHHH S 

51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIVENLLAN VYHQINHLKT 
HHHHHHHHH HHHHHHHHHH HS TTT HHHHHHHHHH HHHHHHHHHH 

101 VLEEKLEKED FTRGKLMSSL HLKRYYGRIL HYLKAKEYSH CAWTIVRVEI 
HHHHTTTTS HHHHHHHH HHHHHHHHHH HHHHHTTT H HHHHHHHHHH 

151 LRNFYFINRL TGYLRN 
HHHHHHHHHH HTT 

FIG.. IB 



Human Interferon-Beta Gene Sequence 

1 atgaccaaca agtgtctcct ccaaattgct ctcctgttgt gcttctccac tacagctctt 
61 tccatgagct acaacttgct tggattccta caaagaagca gcaattttca gtgtcagaag 
121 ctcctgtggc aattgaatgg gaggcttgaa tattgcctca aggacaggat gaactttgac 
181 atccctgagg agattaagca gctgcagcag ttccagaagg aggacgccgc attgaccatc 
241 tatgagatgc tccagaacat ctttgctatt ttcagacaag attcatctag cactggctgg 
301 aatgagacta ttgttgagaa cctcctggct aatgtctatc atcagataaa ccatctgaag 
361 acagtcctgg aagaaaaact ggagaaagaa gattttacca ggggaaaact catgagcagt 
421 ctgcacctga aaagatatta tgggaggatt ctgcattacc tgaaggccaa ggagtacagt 
481 cactgtgcct ggaccatagt cagagtggaa atcctaagga acttttactt cattaacaga 
541 cttacaggtt acctccgaaa ctgaagatct cctagcctgt ccctctggga ctggacaatt 
601 gcttcaagca ttcttcaacc agcagatgct gtttaagtga ctgatggcta atgtactgca 
661 aatgaaagga cactagaaga ttttgaaatt tttattaaat tatgagttat ttttatttat 
721 ttaaatttta ttttggaaaa taaattattt ttggtgc 



FIG..1C 
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1 MSYNLLGFLQ RSSNFQCQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 

51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENFLAN VYHQIlsfHLKT 

101 VLEEKLEKED FTRGKLMSSL HLKRYYGRIL HYLKAKEYSH CAWTIVRVEI 
151 LRNFYFINRL TGYLRN 
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51 QKEDAALTIY EMLQNIFAVF RQDSSSTGWN ETIIENLLAN lYHQINHFKT 
101 VLEEKLEKED FTRGKLMASL HIKRYYGRIL HYLKAKEYSH CAWTIIRVEI 
151 LRNFYFLNRL AGYLRN 

FIG..6B 

1 MSYNLLGFLQ RSYNFQCQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
51 QKEDAALTIY EMLQNIFAVF RQDSSSTGWN ETIIENLLAN lYHQINHFKT 
101 VLEEKLEKED FTRGKLMVSL HVKRYYGRIL HYLKAKEYSH CAWTIIRVEI 
151 LRNFYFLNRL AGYLRN 

FIG..6C 

1 MSYNLLGFLQ RSFNFQCQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENLLAN lYHQINHFKT 
101 VLEEKLEKED FTRGKLMASL HIKRYYGRIL HYLKAKEYSH CAWTIVRVEI 
151 LRNFYFLNRL AGYLRN 
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51 QKEDAALTIY EMLQNIFAVF RQDSSSTGWN ETIIENLLAN lYHQINHFKT 
101 VLEEKLEKED FTRGKLMASL HIKRYYGRIL HYLKAKEYSH CAWTIIRVEI 
151 LRNFYFIiNRL AGYLRN 

FIG..7B 

1 MSYNLLGFLQ RSYNFQDQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
51 QKEDAALTIY EMLQNIFAVF RQDSSSTGWN ETIIENLLAN lYHQINHFKT 
101 VLEEKLEKED FTRGKLMVSL mTKRYYGRIL HYLKAKEYSH CAWTIIRVEI 
151 LRNFYFIiNRL AGYLRN 

FIG..7C 

1 MSYNLLGFLQ RSFNFQDQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENLLAN lYHQINHFKT 
101 VLEEKLEKED FTRGKLMASL HIKRYYGRIL HYLKAKEYSH CAWTIVRVEI 
151 LRNFYFLNRL AGYLRN 
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51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENLLAN lYHQINHLKT 
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FIG..8B 



1 MSYNLLGFLQ RSANFQCQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENLLAN lYHQINHLKT 
101 VLEEKLEKED FTRGKLMCSL HLKRYYGRIL HYLKAKEYSH CAWTIIRVEI 
151 LRNFYFLNRL CGYLRN 

FIG..8C 

1 MSYNLLGFLQ RSENFQDQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENLLAN lYHQINHLKT 
101 VLEEKLEKED FTRGKLMCSL HLKRYYGRIL HYLKAKEYSH CAWTIVRVEI 
151 LRNFYFINRL CGYLRN 
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FIG..9B 

1 MSYNLLGFLQ RSENFQDQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENLLAN lYHQINHLKT 
101 VLEEKLEKED FTRGKLMASL HLKRYYGRIL HYLKAKEYSH CAWTIIRVEI 
151 LRNFYFIiNRL TGYLRN 

FIG..9C 

1 MSYNLLGFLQ RSENFQDQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 
51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENLLAN lYHQINHLKT 
101 VLEEKLEKED FTRGKLIVUlSL HIKRYYGRIL HYLKAKEYSH CAWTIVRVEI 
151 LRNFYFIiNRL AGYLRN 
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FIG.. 10A 



1 MSYNLLGFLQ RSSNFQCQKL LWQLNGRLEY CLKDRMNFDI PEEIKQLQQF 

51 QKEDAALTIY EMLQNIFAIF RQDSSSTGWN ETIIENFLAN VYHQINHLKT 

101 VLEEKLEKED FTRGKLMSSL HLKRYYGRIL HYLKAKEYSH CAWTIVRVEI 

151 LRNFYFINRL TGYLRN 
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RECOMBINANT INTERFERON-BETA specific molecules are responsible for induction, but double- 

MUTEINS stranded RNA and cytokines can be good inducers. There is 

much overlap between different cell types in both the 
This application is a continuing application of U.S. Ser. inducers and the species of IFN that is induced. The major 
No. 60/133,785, filed May 12, 1999. 5 cell types that produce IFNs are: lymphocytes, monocytes 

and macrophages (for IFN-a); fibroblasts and some epithe- 
HELD OF THE INVENTION lial cells and lymphoblasloid cells (for IFN-P); and activated 

n,e invention relates to novel interfeK.n-beta activity ^ Kv^''/^?^ >- ••, ••• . 

(IbA) proteins and nucleic acids. TTie invention further ,pS/iJ '?n .1. f .^"''".'T' ''?'""" 

relates to the use of the IbA proteins in the treatment of 'f'^BM k J "'f ^ the biological consequences 

interferon-beta (INF-B) related disorders °^ '° '^'^P'°'' inhibition of cell 

prohfcration, induction of cell differentiation, changes in 

BACKGROUND OF THE INVENTION '^"^^ morphology, enhancement of histocompatibility antigen 

H„m,n , ncM \ u r u exprcssion on many cells and stimulation of 

Human Interferons (IFNs) are raembers of a biologically immunoglobulin-Fc receptor expression on macrophages. B 

potent family of cytokines. Originally, IFNs were identified lymphocytes can be induced to increase antibody production 

as agents produced and secreted by virus-mfected cells by low concentration of IFN-a or IFN-p. An additional 

which can protect cells against ftirther viral infections. effect of IFN-a and IFN-p is activation of natural killer c«lls 

ma3er.h 'l Tr^- "^""'\ T 'h^' -"^y -^""^i"'' 'he destruction of virus-infected 

many other changes m cellular behavior mcluding effects on cells or tumor cells in vivo. Overall, IFNs seem to be of great 

cellular growth and differentiation and modulation of the importance as part of the body's defense agaiast foreign 

'^^cTJ^fZoH' F-^^' ^"r."- V- ."'"f °f8«"i'""s. foreign antigens and abnormal cell types 

ii -lit^^vi? 1 n Q7«( r Tovey, Biochm.. Biophys. (Clemens, in Cytokines, BIOS Scientific Publishets Limited, 

1^4 7; ?ri£^^^' ^' - •'• '*^(^) 25 Cytokines, WUey, New York, 1988). 

mmB^SJ::^^^^^^ lNF.«andlF^-pweream;ng.hefirs.ofthecytokinesto 

ihlLii nroni^ °c ,PM ^ T ' fi i ^ " . ^ ^y recombinant DNA technology. For example, 

?W a iZcvteTlS%fiLlrsf«^^ '""'^^ ""^ -'J-"- °f human IFN-P 

S^waifS-rf^^^^^^^^ [Tanaguchietal.,GenelO(l).ll-15(1980);Houghtonetal': 

Lc^iewan, j. intecl. iJis 1444).643 (1980)J. Nucleic Acids Res. 8(13):2885-94 (1980)] made it possible 

In humans, the IFN-a subtype encompass a multigene to produce recombinant human IFN-p in e.g., mammalian, 

family of about 20 genes, encoding proteins of 166-172 insect, and yeast cells and in £. coli, that is free from viruses 

amino acids that are all closely related. In contrast to this and other contaminants from human sources [e.g., Ohno and 

diversity, there is only one human interferon-beta (IFN-p) Taniguchi, Nucleic Acids Res. 10(3):967-77 (1982)- Smith 

gene, also encoding a protein of 166 amino acids. IFN-p has 35 et al., Mol. Cell. Biol. 3(12):2156-65 (1983): Demolder el 

low homology to the IFN-a family and is an N-linked al., J. Biotechnol. 32(2):179-89 (1994): Dorin et al US 

^c5;^T.°,".?i^'^'' ^^"^^ ^^(2) Pat. No. 5,814,485 (1998); Konrad et al., U.S. Pat. No! 

:520-523 (1976)]. There is also only one human IFN-y gene 4 450 103 (1984)] 

that encodes a polypeptde of 143 amino acids that is 'ipNs have been shown to have therapeutic value in 

finw!^l 1 Ht T' -t ^^N-Y 40 conditions such as inflammatory, viral, and malignant dis- 

shows only slight structural similarities to IFN-a or to eases [e.g., see Desmyter et al, Lancet 2(7987):645-7 

(1976); Makower and Wadler, Semin. Oncol. 26(6):663-71 

All IFN-a and IFN-p (also commonly referred to as type (1999); Sturzebecher et al.. J, Interferon Cytokine Res 

I mterferon family) appear to bind to a common high afSnity 19(11): 1257-64 (1999); Zein, Cytokines Cell. Mol Ther 

cell surface receptor, a 130 kD glycoprotein that is widely 45 4(4):229-41 (1998; Musch et al., Hepatogastroeneterology 

distributed on different cell types and that is distinct from the 45(24):2282-94 (1998); Wadler el al.. Cancer J. Sci. Am 

one bound by IFN-y. Type-I interferons are recognized by a 4(5):331-7 (1998)]. IFN-P is a marketed drug (Betaseron, 

complex contaimng the receptor subunits ifnarl and ifnar2 manufactured by Berlex and Avonex, manufactured by 

and their associated Janus tyrosine kinases, Tyk2 and Jakl, Biogen) that has been approved for use in treatment of 

that activate the transcription factors STATl and STA17, 50 multiple sclerosis (MS) [Arnason, Biomed Pharmacother 

leading to the formation of the transcription factor complex 53(8):344-50, (1999); Comi et al., Mult. Scler. l(6):317-20 

ISGF3 [interferon-stimulated gene factor 3; Li et al. Bio- (1996); Aappos, Uncet 353(9171): 2242-3 (1999)1 IFN-6 

chemie 80(8-9): 703-20 (1998); Nadeau et al., J. Biol seems to reduce the number of attacks suffered by patients 

Chem. 274(7): 4045-52 (1999)]. Three distinct modes of with relapsing and remitting MS. Betaseron, a recombinant 

IFN/receptor complex interaction are known: (i)INF-a with 55 IFN-p expressed in E, coli, consists of 165 amino acids 

ifnarl and ifnar2; (u) IFN-P with ifnarl and ifnar2; and (iii) (missing the initial methionine) and is genetically engi- 

IFN-p with ifnar2 alone [Lewerenz el ai., J. Mol Biol. neered so that it contains a serine al position 17, to replace 

282(3):585-99(1998)]. While Lewerenz et al suggest that a cysteine. It is a nonglycosylated form of IFN-p. Avonex is 

INF-a and IFN-p interact with their receptors in different a human IFN-p, consisting of 166 amino acids that is 

ways and as such may also signal differently, the events 60 produced by recombinant DNA techniques in CHO cells 

responsible for biological activity beyond receptor binding This is a glycosylated form of IFN-p. Also, recent studies 

are poorly understood. show promising IFN efficacy in treating certain viral 

As might be predicted for such a large family of cytokines diseases, such as Hepatitis B or C, and cancer, 

with almost ubiquitously distributed receptors, IFNs display Most cytokines, including IFN-P, have relatively short 

varied physiological roles. Production of IFN-a or IFN-P is 65 circulation half-lives since they are produced in vivo to act 

induced by infection, including viral infection or the pres- locally and transienUy, To use IFN-p as an effective systemic 

ence of foreign cell types and antigens. It is not clear what therapeutic, one needs relatively large doses and frequent 
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administrations. Frequent parenteral administrations are 5:895-903 (1996); Dahiyat et al Science 278-82-87 

inconvenient and painful. Further, toxic side effects are (1997); Dahiyat et al, J. Mol. Biol. 273:789-96; Dahiyat et 

Zt^'^^t Zuu^!^'^ administration which are so severe al.. Protein Sci. 6:1333-1337 (1997); Jones, Protein Science 

re t^e^ T^^^^L rf ^'''"'k m"""*^ 3:567-574 (1994); Konoi, et al, Pmteins: Structure, Func 
treatment. These side effects are probably associated with 5 »ion and Genetic l0-94i_9^^ naoAW tt,.c. .t«' 

administration of a high dosage. In clinical studies it has In.iHer th. c^^^^^^^^ ^ ^^^^^^S^^^^^^ 

been found that some patients produce antibodies to IFN-p, T^'^V^' T ^1 T'' «^«^Pl^°^^"^^"ty 

which neutralize its biological activity ofsidechams by explicitly modeling the atoms of sequences 

Furthermore, it has been observed that dimers and oHgo- sf NoT^^^^^^^^^ dL^'h""'"' W098/47089, and U.S. 

mers of microbially produced IFN-p are formed in £. coli u' u^''* 09/127 926 descnbe a system for protem design; 

renderingpurification andseparation of IFN-p laboriousTnd ^""'^ '"P^^^^.^ incorporated by reference, 

time consuming. It also necessitates several additional steps ^. ^.^^^ ^^^^\ ^^^^^ proteins exhibiting both significant 

in purification and isolation procedures such as reducing the stability and interferon-beta activity. Accordingly, it is an 

protein during purification and reoxidizing it to restore it to "^^^^ invention to provide interferon-beta activity 

its original conformation, thereby increasing the possibility ^^^^^ protems, nucleic acids and antibodies for the treatment 

of incorrect disulfide bond formation. In addition, and most multiple sclerosis, cancer and viral infections, 

likely attributable to the above-listed shortcomings, micro- SUMMARY OF THF TNTVPMTrnM 

bially produced recombinant human IFN-p has also been 5>UMMAKY OF THE INVENTION 

found to exhibit consistently low specific activity. It would accordance with the objects outlined above, the present 

be desirable, therefore, to microbially produce a biologically provides non-naturally occurring interferon-beta 

active IFN-P protein that has a reduced or eliminated ability activity (IbA) proteins (e.g. the proteins are not found in 

to form intermolecular crosslinks or intramolecular bonds nature) comprising amino acid sequences that are less than 

that cause the protein to adopt an undesirable stmcture. about 97% identical to human INF-p. The IbA proteins have 

To this end, variants of IFN-p sequences, applications and altered biological property of an INF-p protein; 
production procedures are known; see for example U.S. Pat. 25 example, the IbA proteins will be more stable than IFN-p 

Nos. 4,450,103; 4,518^84; 4,588^85; 4,737,462; 4,738^ and bind to cells comprising an interferon receptor complex. 

844; 4,738,845; 4,753,795; 4,769,233; 4,793,995; 4^914^ ^^^^ invention provides IbA proteins with amino acid 

033; 4,959,314; 5,183,746; 5,376,567; 5,545,723; 5i730* sequences that have at least about 3-5 amino acid substitu- 

969; 5,814,485; 5,869,603 and references cited therein. ' *^^°s compared to the INF-p sequence shown in FIG. 1 

Recently, the crystal structures of recombinant murine 30 ^^^^ N0:1). 
INFP [Senda et al, EMBO J. 11(9):3 193-201 (1992); Mitsui a further aspect, the present invention provides non- 
et al., Pharmacol. Ther. 58( 1):93-132 (1993); Senda et al., J. naturally occurring IbA conformers that have three dimen- 
Mol. Biol. 253(l):187-207 (1995); Mitsui et al., J. Inter- backbone structures that substantially correspond to 
feron Cytokine Res. 17(6):319-26 (1997); all of which are ^^^^^ dimensional backbone structure of INFp. In one 
expressly incorporated by reference] and human INFp 35 aspect, the three dimensional backbone structure of the IbA 
[Karpusas et al., Proc. Natl. Acad. Sci. U.S.A. 94(22) conformer corresponds substantially to the three dimen- 
:11813-n8 (1997); Runkel et al., Pharm. Res. 15(4):641-9 sionalbackbonestructureof the A-chain of INFp. In another 
(1998); Runkel et a 273(14):8003^ (1998); Lewerenz et al., aspect, the three dimensional backbone structure of the IbA 
J. Mol. Biol. 282(3):585-99 (1998); all of which are' conformer corresponds substantially to the three dimen- 
expressly incorporated by reference] have been solved. 40 backbone structure of the B-chain of INF-p. The 
Karpusas et al. determined the crystal structure of glycosy- amino acid sequence of the IbA conformer and the amino 
lated human IFN-P at 2.2 Angstrom resolution by molecular sequence of INF-P are less than about 97% identical. In 
replacement. The molecule adopts a fold similar to that of aspect, at least about 90% of the non-identical amino 
the previously determined structures of murine IFN-P and ^^^^ are in a core region of the conformer. In other aspects, 
human IFN-a2b, but displays several, distinct structural 45 Ihe conformer have at least about 100% of the non-identical 
features. Like human IFN-a2b, INF-P contains a zinc- ^^^^ acids are in a core region of the conformer. 
binding site al the interface of the two molecules in the In an additional aspect, the changes are selected from the 
asymmetric unit, however, unlike human IFN-a2b, IFN-p amino acid residues at positions selected from positions 6, 
dimerizes with contact surfaces from opposite sides of the 13, 17*21, 56, 59, 61, 62, 63, 66, 69, 84, 87, 91, 98, 102, 114, 
molecule. Runkel et al. reported structural and functional 50 H^, 122, 129, 146, 150, 154, 157, 160, and 161. In a 
differences between glycosylated (IFN,p-la) and non- preferred aspect, the changes are selected from the amino 
glycosylated (IFN-pib) forms of human IFN-P and sug- acid residues at positions selected from positions 13, 17, 56, 
gested that the greater biological activity of INF-p-la is due 59, 63, 66, 69, 84, 87, 91, 98, 114, 118, 122, 146, 157, and 
to the stabilizing effect of the carbohydrate moiety, 161. In one aspect, the changes are selected from the amino 
The available crystal structure of INFp allows further 55 acid residues at positions selected from positions 13, 17, 69, 
protein design and the generation of more stable proteins or ^l* ^8, 118, 122, 146, 157, and 161. In another 
protein variants with an altered activity. Several groups have aspect, the changes are selected from the amino acid resi- 
applied and experimentally tested systematic, quantitative ^^^^ at positions selected from positions 13, 17, 56, 84, 87, 
methods to protein design with the goal of developing ^1> H"^* H^. 122, and 161. Preferred embodiments include 
general design algorithms (Hellinga et al., J. Mol. Biol. 222: 60 at least about 3-5 variations. 

763-785 (1991); Hurley et al., J. Mol. Biol. 224:1143-1154 In a further aspect, the invention provides recombinant 

(1992); Desjarlaisl et al.. Protein Science 4:2006-2018 nucleic acids encoding the non-naturally occurring IbA 

(1995); Harbury et al., Proc. Natl. Acad. Sci. U.S.A. proteins, expression vectors comprising the recombinant 

92:8408-8412 (1995); Klemba et al., Nat. Stnic. Biol. nucleic acids, and host ceils comprLsing the recombinant 

2:368-373 (1995); Nautiyal et al.. Biochemistry 65 nucleic acids and expression vectors. 

34:11645-11651 (1995); Betzo et al., Biochemistry In an additional aspect, the invention provides methods of 

35:6955-6962 (1996); Dahiyat et al.. Protein Science producing the IbA proteins of the invention comprising 
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culmnng host ceUs compnsing the recombinant nucleic A-chain IFN-P core 2 sequences (only the amino acid 

acids under conditions suitable for expression of the nucleic residues of positions 1, 6,10, 14, 17, 21, 38, 50, 55, 56, 58, 

acids. The proteins may optionally be recovered. In a further 59, 61, 62, 63, 66, 69, 70, 81, 84, 87, 91, 94, 95, 98, 102, \l5i 

aspect, the invention provides pharmaceutical compositions 122, 125, 126, 129, 130, 133, 138, 144, 146, 147,' 150,' 151^ 

comprising an IbA protein of the invention and a phanma- 5 153, 154, 157, 159, 160, 161, 163, and 164 are given). All 

ceutical carrier. values are given in %. For example, at position 91, the 

In an additional aspect, the invention provides methods ^^^'^ ^^^^"^ ^^^^ 1) (SEQ ID 

for treating an ir4Fp responsive condition comprising N0:1); m IbA protems, 8 1.7% of the top 1000 sequences had 

administering an IbA protein of the invention to a patient tf^^^ ^^^^^.f°f^*^°,°' ^°»y 11,5% of the sequences 

The INFP condition includes multiple sclerosis' viral lO fif "Lnlf/^J^l ^^^^^ 98 (leucme m human 

infection, or cancer. J™^^ phenylalanme (68.8%) is preferred over leucine 

BRIEF DESCRIPTION OF TIIE DRAWINGS u ^^^9' ?^S,IP.^^*9 ^^^H^ ^ preferred IbAsequence 

based on the PDA analysis of IFN-p A-chain core 2 
FIG. 1A(SE0 ID N0:1) depicts the amino acid sequence 35 sequence. Amino acid residues different from the human 
of the A-chain of human INFp as used in the determination 1) (SEQ ID N0:1) are shown in bold and 

of the crystal structure [PDB and GenBank # lAUl; Kar- underlined. 

pusas et al., Proc. Natl. Acad. Sci. U.S.A. 94(22): 11813^ FIG. 6A depicts the mutation pattern of IFN-p A-chain 
(1997)] and secondary structure elements. Secondary struc- core 3 sequences based on the analysis of the lowest 1000 
ture element legend: H, alpha helix (4-heIix); B, residue in 20 Pf^o*^i° sequences generated by Monte Carlo analysis of 
isolated beta bridge; E, extended strand, participates in beta A-chain IFN-P core 3 sequences (only the amino acid 
ladder; G, 310 helix (3-helix); I, pi helix (5-helix); T, residues of positions 1, 6, 10, 13, 14, 17, 18, 21, 38, 50, 55, 
hydrogen bonded turn; S, bend. 56, 58, 59, 61, 62, 63, 66, 69, 70, 72, 74, 76, 77, 81, 84, 87, 

FIG. IB (SEQ ID N0:1) depicts the amino acid sequences 114, 115, 118, 122, 125, 126, 129^ 

of the B<hain of human INF-p as used in the determination 25 1^^' 1^^' l^"^' 1^^' 1^^' 1^^' 1"^^' 1"^^* 1"^' 146» 147, 150, 
of the crystal structure (Karpusas et al., supra) and second- 1^1' 1^^' 1^4, 157, 159, 160, 161, 163, and 164 are given), 
ary structure elements. values are given in %. For example, at position 13, the 

FIG. IC (SEQ ID N0:2) depicts the complete DNA ^uman lFN-p amino acid k serine (see FIG. 1) (SEQ ID NO; 
sequence encoding wild type human INF-B (GenBank ^i' /^^^ Proteins, 67.7% of the top 1000 sequences had 
accession number NM„002176). The encoded sequence 30 Phenylalanine at this position and 31.4% of the sequences 
consists of the signaling sequence, MTNKCLLQIAiIlCF- ^^'^T' -^T" sequences had serine at this 

STFALS (SEQ ID N0:3), and the 166 amino acids that ^'' h'^''''- ^- ""^''^f ^^^^^^^^^ ^.^'^^ 
constitute the actual protein (see FIGS. lAand IB) (SEQ ID '"'i % ''rT ^"^'inin* ^ ^^^^ ^? ^?^^^ IbA proteins, 
NO: 1). The DNA sequence of 757 nucleotides includes this ^ nL V? sequences had alanine at this position 
coding sequence and a non-translated region. Bases 1 to 63 35 10-9% of the sequences had tyrosine. None of the IbA 
encode the signaling sequence; bases 64 to 561 encode the "^"'^^^ ^l^TJ' 

actual IFN-P; bases 562 to 564 (TGA) are stop codon; and ^ ^^^^ ^^"^^ ^^P'^^^ ^ preferred IbAsequence 

the rest is untranslated sequence ^^^^ °" ^^e PDA analysis of IFN-p A-chain core 3 

no. 2 depicts the stmcture of wild type IFN-p. Presented ^T^^^'^l'r °1US^^^ '^l iT'^ 

is the A-chain from the PDB file lAUl The amino acid side ^0 1^,^,^^^ ^^^^ ^^'^^ ^" 

chains indicated are those positions included in the PDA . „ xr/^ ^ m j • 

design of CORE 1 (^^^ NO:7-8) depict preferred 

- , . r u *u ... A . . IbA sequences based on the PDA analysis of IFN-P A-chain 

R !J?nnflE t' ThV pn/ 3 o°ly by the direct MC 

B-chain of INF-P selected for PDA. Tlie individual sets are calculation following DEE, but also those after cleaning the 

described m detail herein. j^C list (Q and when running MC over the complete 

FIG. 4A depicts the mutation pattern of IFN-P A-chain sequence space starting from the ground state generated by 

core 1 sequences based on the analysis of the lowest 1000 the direct MC calculation (D). Amino acid residues different 

protein sequences generated by Monte Carlo analysis of from the human IFN-P (see RG. 1) (SEQ ID N0:1) are 

A-chain IFN-P core 1 sequences (only the amino acid jq shown in bold and are underlined, 

residues of positions 6, 21, 55, 56, 59, 62, 63, 66, 69, 84, 87, FIG. 7A depicts the mutation pattern of IFN-p A-chain 

91, 98, 122, 129, 133, 146, 150, 157. and 160 are given). All core 4 sequences based on the analysis of the lowest 1000 

values are given in %. For example, at position 87, the protein sequences generated by Monte Carlo analysis of 

human INF-p amino acid is leucine (see FIG. 1) (SEQ ID A-chain IFN-p core 4 sequences. See FIG. 6A for details of 

N0:1); m IbAprotems, 78.7% of the top 1000 sequences had 55 figure legend. For example, at position 17, the human IFN-P 

phenylalanine at this position, and only 18.4% of the amino acid is cysteine (see FIG. 1) (SEQ ID NO- 1)- in IbA 

sequences had leucine. Similarly, for position 84 (valine in proteins, 82.9% of the top 1000 sequences had aspartic acid 

human INFp, isoleucme (40.5%) and leucine (39.4%) are at this position, 7.1% had threonine, 4.5% had alanine, 4 1% 

preferred over valine (19.6%). had leucine and 1.4% had valine. None of the IbAsequences 

FIG. 4B (SEQ ID NO: 4) depicts a preferred IbAsequence 60 had cysteine at this position, 

based on the PDA analysis of IFN-p A-chain core 1 FIG. 7B (SEQ ID N0:9) depicts a preferred IbAsequence 

sequence. Amino acid residues different from the human based on the PDA analysis of IFN-P A-chain core 4 

IFN-p (see FIG. 1) (SEQ ID N0:1) are shown in bold and sequence. Amino acid residues different from the human 

are underlined. IFN-p (see HG. 1) (SEQ ID N0:1) are shown in bold and 

FIG. 5A depicts the mutation pattern of IFN-p A-chain 65 are underlined, 

core 2 sequences based on the analysis of the lowest 1000 FIG. 7C and FIG. 7D (SEQ ID NOS;10-11) depict 

protein sequences generated by Monte Carlo analysis of preferred IbA sequences based on the PDA analysis of IFN-p 
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A-chain core 4 sequence, generaied not only by the direct 1 sequence. Amino acid residues different from the human 

MC calculation following DEE, but also those after cleaning IFN-p (see FIG. 1) SEQ ID NO:I) are shown in bold and are 

the MC hst (C) and when running MC over the complete underlined. 

sequence space starting from the ground slate eenerated bv ci,^ haj . .. „ „ 

the direct MC calculation (D). Amino acid residues different c ^'^^f Pf "^"^ f ^^'P ^-^^ain 

from the human IFN-p (see FIG. 1) (SEQ ID NO l) are """'^ ^ sequences based on the analysis of the lowest 1000 

shown in bold and are underlined * P^^*^^° sequences generaied by Monte Carlo analysis of 

FIG. 8A depicts the mutation pattern of IFN-B A-chain ^"^^^'^ T^'^- ^""'^ ^ sequences (only the amino acid 

core 5 sequences based on the analysis of the lowest 1000 [TtTl^ l'^'^'''''^ 

protein sequences generated by Monte Carlo analysis of ^4, 95, 98, 102, 115, 

A-chain IFN-p core 5 sequences. See FIG. 6A for details of ^^^^ ^^5, 126, 129, 130, 133, 138, 144, 146, 147, 150, 151, 

figure legend. For example, at position 84, the human IFN-p ^^^^ ^^l* 163, and 164 are given). All 

amino acid is valine (see FIG. 1) (SEQ ID N0:1); in IbA ^^^^^^ Siven in %. For example, at position 56, the 

proteins, 99.5% of the top 1000 sequences had isoleucine at human IFN-p amino acid is alanine (see FIG. 1) (SEQ ID 

this position and 0.5% had leucine. None of the IbA NO: 1); in IbA proteins, 97.6% of the top 1000 sequences had 

sequences had valine at this position. leucine at this position, and only 2.4% of the sequences had 

FIG. 8B (SEQ ID N0:12) depicts a preferred IbA alanine. Similarly, for position 91 (valine in human IFN-P), 

sequence based on the PDA analysis of IFN-P A-chain core isoleucine (68.5%) and leucine (27.7%) are preferred over 

5 sequence. Amino acid residues different from the human valine (3.8%). 

IFN-p (see FIG. 1) (SEQ ID N0:1) are shown in bold and FIG. IIB (SEQ ID NO:19) depicts a preferred IbA 
are underlined. 20 sequence based on the PDA analysis of IFN-p B-chain core 

FIG. 8C and FIG. 8D (SEQ ID NOS:13-14) depict 2 sequence. Amino acid residues different from the human 

preferred IbAsequences based on the PDA analysis of IFN-P IFN-P (see FIG. 1) (SEQ ID N0:1) are shown in bold and 

A-chain core 5 sequence, generated not only by the direct are underlined. 

MC calculation following DEE, but also those after cleaning FIG. 12A depicts the mutation pattern of IFN-P B-chain 

the MC list (C) and when running MC over the complete core 3 sequences based on the analysis of the lowest 1000 

sequence space starting from the ground state generated by protein sequences generated by Monte Carlo analysis of 

the direct MC calculation (D). Amino acid residues different B-chain IFN-P core 3 sequences (only the amino acid 

from the human IFN-p (see FIG. 1) (SEQ ID N0:1) are residues of positions 1, 6, 10, 13, 14, 15, 17, 21, 38, 50, 55, 

shown in bold and are underlined. 56, 58, 10 59, 61, 62, 63, 66, 69, 70, 72, 74,' 76,' 77,' 81,' 84,' 

HG. 9A depicts the mutation pattern of IFN-P A-chain 87, 90, 91, 94, 95, 98, 102, 114, 115, 118, 122, 125, 126, 'l29i 

core 6 sequences based on the analysis of the lowest 1000 130, 132, 133, 136, 138, 139, 142, 143, 144, 146, 147, 150, 

protein sequences generated by Monte Carlo analysis of 151, 153, 154, 157, 159, 160, 161, 163, and 164 are given). 

A-chain IFN-p core 6 sequences. See FIG. 6A for details of All values are given in %. For example, at position 13, the 

figure legend. For example, at position 118, the human human IFN-p amino acid is serine (see FIG. 1); in IbA 

IFN-p amino acid is serine (see FIG. 1) (SEQ ID N0:1); in proteins, 92.7% of the top 1000 sequences had leucine at this 

IbA proteins, 100% of the top 1000 sequences had alanine. position and 7.3% of the sequences had alanine. None of the 

None of the IbA sequences had serine at this position. IbA sequences had serine at this position. Similarly, at 

FIG. 9B (SEQ ID NO: 15) depicts a preferred IbA position 118, the human IFN-p amino acid is serine (see 

sequence based on the PDA analysis of IFN-P A-chain core F^G. 1) (SEQ ID N0:1); in IbA proteins, 100% of the top 

6 sequence. Amino acid residues different from the human sequences had leucine at this position. 

IFN-p (see FIG. 1) (SEQ ID N0:1) are shown in bold and FIG. 12B (SEQ ID NO:20) depicts a preferred IbA 

are underlined. sequence based on the PDA analysis of IFN-p B-chain core 

FIG. 9C and FIG. 9D (SEQ ID NOS: 16-17) depict 3 sequence. Amino acid residues different from the human 

preferred IbAsequenccsbased on the PDAanalysisoflFN-P 45 IFN-P (see FIG. 1) (SEQ ID N0:1) are shown in bold and 

A-chain core 6 sequence, generated not only by the direct are underlined. 

MC calculation following DEE, but also those after cleaning FIG. 13A depicts the mutation pattern of IFN-p B-chain 

the MC list (C) and when running MC over the complete core 4 sequences based on the analysis of the lowest 1000 

sequence space starting from the ground state generated by protein sequences generated by Monte Carlo analysis of 

the direct MC calculation (D). Amino acid residues different 50 B-chain IFN-p core 4 sequences. See FIG. 12A for details of 

from the human IFN-p (see FIG. 1) (SEQ ID N0:1) are figure legend. For example, at position 56, the human IFN-p 

shown in bold and are underlined. amino acid is alanine (see FIG. 1) (SEQ ID N0:1); in IbA 

FIG. lOA depicts the mutation pattern of IFN-P B-chain proteins, 97.7% of the top 1000 sequences had leucine at this 

core 1 sequences based on the analysis of the lowest 1000 position and only 2.3% had alanine. Similarly, at position 

protein sequences generated by Monte Carlo analysis of 55 114, the human IFN-p amino acid is glycine (see FIG. 1) 

B-chain IFN-P core 1 sequences (only the amino acid (SEQ ID N0:1); in IbA proteins, 100% of the top 1000 

residues of positions 6, 21, 55, 56, 59, 62, 63, 66, 69, 84, 87, sequences had phenylalanine at this position. 

91,98,122,129, 133, 146, 150, 157, and 160 are given). All FIG. 13B (SEQ ID N0:21) depicts a prefeaed IbA 

values are given in %. For example, at position 87, the sequence based on the PDA analysis of IFN-p B-chain core 

human IFN-P amino acid is leucine (see FIG. 1) (SEQ ID 60 4 sequence. Amino acid residues different from the human 

N0:1); in IbA proteins, 74.6% of the top 1000 sequences had IFN-p (see FIG. 1) (SEQ ID N0:1) are shown in bold and 

phenylalanine at this position, and only 21.5% of the are underlined. 

sequences had leucine. Similarly, for position 84 (valine in FIG. 14A depicts the mutation pattern of IFN-p B-chain 

human IFN-p), isoleucme (62.3%) is preferred over valine core 5 sequences based on the analysis of the lowest 1000 

(25.4%). g5 protein sequences generated by Monte Carlo analysis of 

FIG. lOB (SEQ ID N0:18) depicts a preferred IbA B-chain IFN-P core 5 sequences (only the amino acid 

sequence based on the PDA analysis of IFN-p B-chain core residues of positions 1, 6, 10, 13, 14, 17, 18, 21, 38, 50, 55, 



us 6,514,729 Bl 

9 10 

56, 58, 59, 61, 62, 63, 66, 69, 70, 72, 74, 76, 77, 81, 84, 87, nation with standard primers. This generally requires fewer 

90, 91, 94, 95, 98, 102, 114, 115, 118, 122, 125, 126, 129, oligonucleotides and can resuh in fewer errors. 

130, 132, 133, 136, 138, 139, 142, 143, 144, 146, 147, 150, piG. 19 depicts an overlapping extension method. At the 

151, 153, 154, 157, 159, 160, 161, 163, and 164 are given). top of FIG. 19 is the template DNAshowing the locations of 

For example, at position 56, the human IFN-P amino acid is 5 the regions to be mutated (black boxes) and the binding sites 

alanine (see FIG. 1) (SEQ ID N0:1); in IbA proteins, 97,6% of the relevant primers (arrows). The primers Rl and R2 

! % I.°J? 1000 sequences had leucine at this position and represent a pool of primers, each containing a different 

only 2.4% had alanme. Similarly, at position 114, the human mutation; as described herein, this may be done using 

IFN-P ammo acid is glycine (see FIG. 1) (SEQ ID N0:1); different ratios of primers if desired. The variant position is 

in IbAprotems, 100% of the top 1000 sequences had leucine lO flanked by regions of homology sufficient to get hybridiza- 

at this position. tion. Thus, as shown in this example, oligos Rl and F2 

FIG. 14B (SEQ ID N0:1) depicts a preferred IbA comprise a region of homology and so do oligos R2 and F3. 

sequence based on the PDA analysis of IFN-fl B-chain core In this example, three separate PGR reactions are done for 

5 sequence. Amino acid residues different from the human step 1. The first reaction contains the template plus oligos Fl 
IFN-p (see FIG. 1) (SEQ ID N0:1) are shown in bold and 35 and Rl. The second reaction contains template plus oligos 
are underlined. F2 and R2, and the third contains the template and oligos F3 

FIG. ISA depicts the mutation pattern of IFN-P B-chain and R3. The reaction products are shown. In Step 2, the 

core 6 sequences based on the analysis of the lowest 1000 products from Step 1 tube 1 and Step 1 tube 2 are taken, 

protein sequences generated by Monte Carlo analysis of After purification away from the primers, these are added to 

B-chain IFN-p core 6 sequences. See FIG. 14A for details of ^0 a fresh PGR reaction together with Fl and R4. During the 

figure legend. For example, at position 118, the human denaturation phase of the PGR, the overlapping regions 

IFN-P amino acid is serine (see FIG. 1) (SEQ ID N0:1); in anneal and the second strand is synthesized. The product is 

IbA proteins, 99,4% of the top 1000 sequences had glutamic then amplified by the outside primers, Fl and R4. In Step 3, 

acid at this position and 0.6% had alanine. None of the IbA the purified product from Step 2 is used in a third PGR 

sequences had serine at this position. Similarly, for position reaction, together with the product of Step 1, tube 3 and the 

161 (threonine in human IFN-p), glutamic acid (86.4%) is primers Fl and R3. The final product corresponds to the full 

preferred over threonine (12.1 %). length gene and contains the required mutations. 

FIG. 15B (SEQ ID NO:23) depicts a preferred IbA Alternatively, Step 2 and Step 3 can be performed in one 

sequence based on the PDA analysis of IFN-P B-chain core P^R reaction. 

6 sequence. Amino acid residues different from the human FIG. 20 depicts a ligation of PGR reaction products to 
IFN-p (see FIG. 1) (SEQ ID N0:1) are shown in bold and synthesize the libraries of the invention. In this technique, 
are underlined. the primers also contain an endonuclease restriction site 

FIG. 16A depicts the mutation pattern of IFN-P, B-chain (RE)\ either generating blunt ends, 5' overhanging ends or 3' 

core 7 sequences based on the analysis of the lowest 1000 35 overhanging ends. We set up three separate PGR reactions 

protein sequences generated by Monte Carlo analysis of fo"^ Step 1. The first reaction contains the template plus 

B-chain IFN-P core 7sequences. See FIG. 14A for details of oligos Fl and Rl. The second reaction contains template 

figure legend. For example, at position 17, the human IFN-p oligos F2 and R2, and the third contains the template 

amino acid is cysteine (see FIG. 1) (SEQ ID N0:1); in IbA and oligos F3 and R3. 'Ilie reaction products are shown. In 

proteins, 32.8% of the top 1000 sequences had threonine at Step 2, the products of Step 1 are purified and then digested 

this position, 31 % had alanine, 29% had aspartic acid, 5% with the appropriate restriction endonuclease. The digestion 

had glutamic acid, 1.4% had serine, and 0.8% had glycine. products from Step 2, tube 1 and Step 2, tube 2 are ligated 

None of the IbA sequences had cysteine at this position. together with DNA ligase (Step 3). The products are then 

FIG. 16B (SEQ ID NO:24) depicts a preferred IbA amplified in Step 4 using oligos Fl and R4. The whole 

sequence based on the PDA analysis of IFN-P B-chain core 45 Process is then repeated by digesting the amplified products, 

7 sequence. Amino acid residues different from the human legating them to the digested products of Step 2, tube 3, and 
IFN-P (see FIG. 1) (SEQ ID N0:1) are shown in bold and amplifying the final product using oligos Fl and R3. It 
are underlined. would also be possible to ligate all three PGR products from 

FIG. 17 depicts the synthesis of a full-length gene and all ^.^"P ^i^lf '^'^Ti P''*^^^^"^ restriction 

possible mutations by PGR. Overlapping oligonucleotides 50 ^^^^ '"^ ^^^^ "^''^ 

corresponding to the full-length gene (black bar. Step 1) and FIG. 21 depicts blunt end Ugation of PGR products. In this 

comprising one or more desired mutations arc synthesized, technique, oligos such as F2 and Rl or R2 and F3 do not 

heated and annealed. Addition of DNA polymerase to the overlap, but they abut. Again three separate PGR reactions 

annealed oligonucleotdes results in the 5' to 3' synthesis of are performed. The products from-lube 1 and tube 2 (see 

DNA (Step 2) to produce longer DNA fragments (Step 3). 55 ^^^P ^) ligated, and then amplified with outside 

Repealed cycles of healing, annealing, and DNA synthesis primers Fl and R4. This product is then ligated with the 

(Step 4) result in the production of longer DNA, including product from Step 1, tube 3. The final products are then 

some full-length molecules. These can be selected by a amplified with primers Fl and R3. 
second round of PGR using primers (indicated by arrows) 

corresponding to the end of the full-length gene (Step 5). 60 

FIG. 18 depicts a preferred scheme for synthesizing an 

IbA library of the invention. The wild type gene, or any The present invention is directed to novel proteins and 

starling gene, such as the gene for the global minima gene, nucleic acids possessing interferon-beta activity (sometimes 

can be used. Oligonucleotides comprising sequences that referred to herein as "IbA proteins" and "IbA nucleic 

encode different amino acids at the different variant posi- 65 acids"). The proteins are generated using a system previ- 

tions (indicated in the Figure by box 1, box 2, and box 3) can ously described in WO98/47089 and U.S. Ser. Nos. 09/058, 

be used during PGR. Those primers can be used in combi- 459, 09/127,926, 60/104,612, 60/158,700, 09/419,351, 
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60/181,630, 60/186,904, and U.S patent application, entitled defined. These results can be used to generate a probability 

Protein Design Automation For Protein Libraries (Filed: table, as outlined below. Similarly, these sequence variations 

Apr. 14, 2000; Inventor: Bassil Dahiyat), all of which are can be tabulated and a secondary library defined from them 

expressly incorporated by reference in their entirety, that is defined below. Alternatively, the allowed sequence varia- 

a computational modeling system that allows the generation 5 ^io^s can be used to define the amino acids considered at 

of extremely stable proteins without necessarily disturbing ^^ch position during the computational screening. Another 

the biological functions of the protein itself. In this way, variation is to bias the score for amino acids that occur in the 

novel IbA proteins and nucleic acids are generated, that can sequence alignment, thereby increasing the likelihood that 

have a plurality of mutations in comparison to the wild-type Ihey are found during computational screening but still 

enzyme yet retain significant activity. lo allowing consideration of other amino acids. This bias 

Generally, there are a variety of computational methods ^^^^^^ ^ ^ focused library of IbA proteins but would 

that can be used to generate the IbA proteins of the inven- elimmate from consideration amino acids not found in 

tion. In a preferred embodiment, sequence based methods alignment. In addition, a number of other types of bias 

are used. Alternatively, structure based methods, such as "^^V mfoduced. For example, diversity may be forced; 

PDA, described in detail below, are used 15 that is, a "conserved" residue is chosen and altered to force 

Similarly, molecular dynamics calculations can be used to f.^^^^'^^^ °" protein and thus sample a greater portion of 

computationally screen sequences by individuaUy calculat- space Alternatively, the positions of high 

ing mutant sequence scores and compiling a rank ordered ^'"^^^^^^ between family members (i.e. low conservation) 

r o ^^jj randomized, either using all or a subset of amino 
- , , . ,20 acids. Similarly, outlier residues, either positional outliers or 

In a preferred embodiment, residue pair potentials can be ^^^^^^^ ^ eliminated, 

used to score sequences (Miyazawa et al., Macromolecules e- i i . ^ i r . r . . n i j 

18(3):534-552 (1985), expressly incorporated by reference) , ^'""^'^J' f'^T '•^Sn'nent of structurally related pro- 

, • * V 1 • teins can be done to generate sequence alignments (Oreneo 

dunng computational screemng. ^, _ 3^^^,^^^ 5(8):To93-108 (1997); Holm et al.. Nucleic 

In a preferred embodiment, sequence profile scores ^5 Acids Res. 26(l):316-9 (1998), both of which are incorpo- 
(Bowie et al Science 253(5016):164-70 (1991), incorpo- ^ated by reference). These sequence alignments can then be 
rated by reference) and/or polenUals of mean force examined to determine the observed sequence variations. 
(Hendhch et al.. J. Mol. Biol. 216(1):167-180 (1990). also Libraries can be generated by predicting secondary structure 
mcorporated by reference) can also be calculated to score f^^^ sequence, and then selecting sequences that are corn- 
sequences. These methods assess the match between a 3^ p^titie with the predicted secondary structure. Hiere are a 
sequence and a 3D protein structure and hence can act to Q^^ber of secondary structure prediction methods such as 
screen for fidehty to the protem structure. By using different helix-coil transition theory (Munoz and Serrano, Biopoly- 
sconng functions to rank sequences, different regions of 41.495^ 1997)^ neural networks. local structure align- 
sequence space can be sampled in the computational screen. ^^n, ^,^6^ (^.g., see in Selbig et al.. Bioinformatics 

Furthermore, scoring functions can be used to screen for 35 15:1039-46 1999) 

sequences that would create metal or co-factor binding sites similarly,' as outUned above, other computational meth- 

m the protem (Hellinga, Fold Des.3(l):Rl-8 (1998), hereby ^^s are known, including, but not limited to, sequence 

expressly mcorporated by reference). Similarly, scoring profiling [Bowie and Eisenberg, Science 253(5016); 164-70, 

functions can be used to screen for sequences that would (1991)]^ ^^^^^^^ ^^^^^^ selections [Dahiyat and Mayo, 

create disulfide bonds m the protein. These potentials protein Sci. 5(5):895-903 (1996); Dahiyat and Mayo, Sci- 

attempt to specifically modify a protein structure to intro- ^^^6 278(5335) :82-7 (1997); Desjarlais and Handel, Protein 

duce a new structural motif. Science 4:2006-2018 (1995); Harbury et Proc. Natl. Acad. 

In a preferred embodiment, sequence and/or structural Sci. U.S.A. 92(18):8408-8412 (1995); Kono et al., Proteins: 

alignment programs can be used to generate the IbA proteins Strucmre, Function and Genetics 19:244-255 (1994); Hcll- 

of the invention. As is known in the art, there are a number 45 inga and Richards, Proc. Natl, Acad. Sci. U.S.A. 

of sequence-based alignment programs; including for 91:5803-5807 (1994)]; and residue pair potentials [Jones, 

example, Smith-Waterman searches, Needleman-Wunsch, Protein Science 3: 567-574, (1994)]; PROSA [Heindlich et 

Double Affine Smith-Waterman, frame search, Gribskov/ ai.^ j. MoI. Biol. 216:167-180 (1990)]; THREADER [Jones 

GCG profile search, Gribskov/GCG profile scan, profile ct ai.. Nature 358:86-89 (1992)], and other inverse folding 

frame search, Bucher generalized profiles, Hidden Markov 50 methods such as those described by Simons et al. [Proteins, 

models, Hframe, Double Frame, Blast, Psi-Blast, Clustal. 34:535-543, (1999)], Uvitt and Gerstein [Proc. Natl. Acad, 

and GeneWise. Sci. U.S.A., 95:5913-5920, (1998)], Godzik and Skolnick 

As is known in the art, there are a number of sequence [Proc, Natl, Acad. Sci. U.S.A., 89:12098-102, (1992)], 

alignment methodologies that can be used. For example, Godzik et al. [J. Mol. Biol. 227:227-38, (1992)], and other 

sequence homology based alignment methods can be used to 55 profile methods [Gribskov et al. Proc. Natl. Acad. Sci. 

create sequence alignments of proteins related to the target U,S,A. 84:4355-4358 (1987) and Fischer and Eisenberg, 

structure (Altschul et al., J. Mol. Biol. 215(3):403-410 Protein Sci, 5:947-955 (1996), Rice and Eisenberg J. Mol. 

(1990), Altschul et al.. Nucleic Acids Res. 25:3389-3402 Biol. 267:1026-1038(1997)], all of which are expressly 

(1997), both incorporated by reference). These sequence incorporated by reference. In addition, other computational 

alignments are then examined to determine the observed 60 methods such as those described by Koehl and Levitt (J. 

sequence variations. These sequence variations are tabulated Mol. Biol. 293:1161-1181 (1999); J. Mol, Biol, 

to define a set of IbA proteins, 293:1183-1193 (1999); expressly incorporated by 

Sequence based alignments can be used in a variety of reference) can be used to create a protein sequence library 

ways. For example, a number of related proteins can be which can optionally then be used to generate a smaller 

aligned, as is known in the art, and the ^Variable" and 65 secondary library for use in experimental screening for 

"conserved" residues defined; that is, the residues that vary improved properties and function. In addition, there are 

or remain identical between the family membere can be computational methods based on force field calculations 
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such as SCMF that can be used as well for SCMF, see backbone, all possible sequences of rotamers must be 
Delarue et al. Pac. Symp. Biocomput. 109-21 (1997); Koehl screened, where each backbone position can be occupied 
et al, 1 Mol. Biol 239:249-75 (1994); Koehl et al, Nat. either by each amino acid in all its possible rotameric states, 
Struct. Biol. 2:163-70 (1995); Koehl et al., Curr. Opin. or a subset of amino acids, and thus a subset of rotamers. 
Struct. Biol. 6:222-6 (1996); Koehl et al., J. Biol. 5 Two sets of interactions are then calculated for each 
293:1183-93 (1999); Koehl et al., J. Mol. Biol. 293: 1161-81 rotamer at every position: the interaction of the rotamer side 
(1999); Lee J., Mol. Biol.236:918-39 (1994); and Vasquez chain with all or part of the backbone (the "singles" energy, 
Biopolymers 36:53-70 (1995); all of which are expressly also called the rotamer/template or rotamer/backbone 
incorporated by reference. Other forcefleld calculations that energy), and the interaction of the rotamer side chain with all 
can be used to optimize the conformation of a sequence 10 other possible rotamers at every other position or a subset of 
within a computational method, or to generate de novo ^he other positions (the "doubles" energy, also called the 
optimized sequences as outhncd herein include, but are not rotamer/rotamer energy). The energy of each of these inter- 
limited to, OPLS.AA [Jorgensen et al, J. Am. Chem. Soc. f ^ calculated through the use of a variety of scoring 
118:11225-11236 (1996); Jorgensen, W. L.; BOSS, Version ^^^''ons which mclude the energy of van der WaaFs 
4.1; Yale University: New Haven, CT (1999)]; OPLS 15 ^""'"^^^ the energy of hydrogen bondmg, the energy of 
rT™.™„ ,1 T a™ r-u-™ c' 11rt^/:c^ff /inoo\ secondary Structure propensity, the energy of surface area 
Jorgensen et al, J. Am, Chem. ^oc. 10:1657ff (1^^^^^^^ ^^^^^-^^ electrostatics. Tl.us, the total energy of 
^?K^Tc ^- J- ^' Soc.ll2:4768ff (1990)]; ^^^^ rotamer interaction, both with the backbone and other 
UNRES (United Residue Forcefleld; Uwo et al.. Protein rotamers, is calculated, and stored in a matrix form. 

f f^^ic" 71^n^t^^^f-^^^ ^'""'^ f 'I" The discrete nature of rotamer sets allows a simple 

Liwo et al., J. Comp. Chem. 20 calculation of the number of rotamer sequences to be tested. 

18:849-873 (1997); Liwo et al. Comp. Chem. 18:874-884 ^ backbone of length n with m possible rotamers per 

(1997); Liwo et ai„ J. Comp. Chem. 19:259-276 (1998); position will have m" possible rotamer sequences, a number 

Forcefleld for Protein Structure Prediction (Liwo et al, Proc. which grows exponentially with sequence length and ren- 

Natl, Acad. Sci. U.S.A. 96:5482-5485 (1999)]; ECEPP/3 ders the calculations either unwieldy or impossible in real 

[Liwo et al., J Protein Chem. 13(4):375-80 (1994)]; 25 time. Accordingly, to solve this combinatorial search 

AMBER 1.1 force field (Weiner et a Am. Chem, Soc. problem, a "Dead End Elimination" (DEE) calculation is 

106:765-784); AMBER 3.0 force field [U. C. Singh et al., performed. The DEE calculation is based on the fact that if 

Proc. Natl. Acad. Sci. U.S.A. 82:755-759 (1985)]; the worst total interaction of a first rotamer is still better than 

CHARMM and CHARMM22 (Brooks et al, J. Comp. the best total interaction of a second rotamer, then the second 

Chem. 4:187-217); cvff3.0 [Dauber-Osguthorpe et al.. Pro- 30 rotamer cannot be part of the global optimum solution. Since 

teins: Structure, Function and Genetics, 4:31-47 (1988)]; the energies of all rotamers have already been calculated, the 

cff99:l (Maple et al., J. Comp. Chem. 15:162-182); also, the DEE approach only requires sums over the sequence length 

DISCOVER (cvff and cff91) and AMBER forceflelds are lo test and eliminate rotamers, which speeds up the calcu- 

used in the INSIGHT molecular modeling package (Biosym/ Nations considerably. DEE can be rerun comparing pairs of 

MSI, San Diego Calif.) and HARMM is used in the 35 rotamers, or combinations of rotamers, which will eventu- 

QUANTA molecular modeling package (Biosym/MSI, San ^^^y result in the determination of a single sequence which 

Diego Calif.), aU of which are expressly incorporated by represents the global optimum energy, 
reference. In fact, as is outlined below, these forcefleld Once the global solution has been found, a Monte Carlo 

methods may be used to generate the secondary library search may be done to generate a rank-ordered list of 

directly; that is, no primary library is generated; rather, these 40 sequences in the neighborhood of the DEE solution. Starting 

methods can be used to generate a probability table from at the DEE solution, random positions are changed to other 

which the secondary library is directly generated. rotamers, and the new sequence energy is calculated. If the 

In a preferred embodiment, the computational method new sequence meets the criteria for acceptance, it is used as 

used to generate the primary library is Protein Design a starting point for another jump. After a predetermined 

Automation (PDA), as is described in U.S. Ser. Nos. 60/061, 45 number of jumps, a rank-ordered list of sequences is gen- 

097, 60/043,464, 60/054,678, 09/127,926, 60/104,612, erated. Monte Carlo searching is a sampling technique to 

60/158,700, 09/419,351, 60/181,630. 60/186,904, U.S explore sequence space around the global minimum or to 

patent application, entitled Protein Design Automation For find new local minima distant in sequence space. As is more 

Protein Libraries (Filed: Apr. 14, 2000; Inventor: Bassil additionally outlined below, there are other sampling tech- 

Dahiyat) and PCT US98/07254, aU of which are expressly 50 niques that can be used, including Boltzman samphng, 

incorporated herein by reference. Briefly, PDA can be genetic algorithm techniques and simulated annealing. In 

described as follows. A known protein structure is used as addition, for all the samphng techniques, the kinds of jumps 

the starting point, llie residues to be optimized are then allowed can be altered (e.g. random jumps to random 

identified, which may be the entire sequence or subset(s) residues, biased jumps (to or away from wild-type, for 

thereof. The side chains of any positions to be varied are 55 example), jumps to biased residues (to or away from similar 

then removed. The resulting structure consisting of the residues, for example), etc.). Similarly, for all the sampling 

protein backbone and the remaining sidechains is called the techniques, the acceptance criteria of whether a sampling 

template. Each variable residue position is then preferably jump is accepted can be altered. 

classified as a core residue, a surface residue, or a boundary As outlined in U.S. Ser. No. 09/127,926, the protein 

residue; each classification defines a subset of possible 60 backbone (comprising (for a naturally occuring protein) the 

amino acid residues for the position (for example, core nitrogen, the carbon yl carbon, the a-carbon, and the carbo- 

residues generally will be selected firom the set of hydro- nyl oxygen, along with the direction of the vector from the 

phobic residues, surface residues generally will be selected a-carbon to the p-carbon) may be altered prior to the 

from the hydrophilic residues, and boundary residues may computational analysis, by varying a set of parameters 

be either). Each amino acid can be represented by a discrete 65 called supersecondary stmcture parameters, 
set of all allowed conformers of each side chain, called Once a protein structure backbone is generated (with 

rotamers. Thus, to arrive at an optimal sequence for a alterations, as outlined above) and input into the computer. 
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explicit hydrogens are added if not included within the may all be fixed in their amino acid identity and a single 

structure (for example, if the structure was generated by rotamer conformation, or "floated", which only fixes the 

X-ray crystallography, hydrogens must be added). After identity but not the rotamer conformation, 

hydrogen addition, energy minimization of the structure is c;^;i„i„ j u- u u u 

run, to relax the hydrogei^ as well as the other atoms, bond 5 .e.-Zl rl^^/^^^^^^^^^^^ ""^t T be chosen as vanable 

angles and bond lengths. In a preferred embodiment, this is '""^^^'^ "^^^ ^^^^ that confer undesirable biological 

done by doing a number of steps of conjugate gradient f "^utes, such as susceptibility to proteolytic degradation, 

minimization [Mayo et ah, J. Phys. Chem. 94:8897 (1990)] amienzation or aggregation sites, glycosylabon sites which 

of atomic coordinate positions to minimize the Dreiding ^^^^ *° immune responses, unwanted binding activity, 

force field with no electrostatics. Generally from about 10 to unwanted allostery, undesirable enzyme activity but with a 

about 250 steps is preferred, with about 50 being most ^° preservation of binding, etc. 

preferred. In a preferred embodiment, each variable position is 

The protein backbone structure contains at least one classified as either a core, surface or boundary residue 

variable residue position. As is known in the art, the position, although in some cases, as explained below, the 

residues, or amino acids, of proteins are generally sequen- ^5 variable position may be set to glycine to minimize back- 

tially numbered starting with the N-lerminus of the protein. ^one strain. In addition, as outlined herein, residues need not 

Thus a protein having a methionine at it's N-terminus is said ^e classified, they can be chosen as variable and any set of 

to have a methionine at residue or amino acid position 1, amino acids may be used. Any combination of core, surface 

with the next residues as 2, 3, 4, etc. At each position, the boundary positions can be utilized: core, surface and 

wild type (i.e. naturally occuring) protein may have one of 20 boundary residues; core and surface residues; core and 

at least 20 amino acids, in any number of rotamers. By boundary residues, and surface and boundary residues, as 

"variable residue position" herein is meant an amino acid ^^^^ ^^e residues alone, surface residues alone, or 

position of the protein to be designed that is not fixed in the boundary residues alone. 

design method as a specific residue or rotamer, generally the The classification of residue positions as core, surface or 

wild-type residue or rotamer. ^5 boundary may be done in several ways, as will be appreci- 

In a preferred embodiment, all of the residue positions of ^l^d by those in the art. In a preferred embodiment, the 

the protein are variable, lliat is, every amino acid side chain classification is done via a visual scan of the original protein 

may be altered in the methods of the present invention. This backbone structure, including the side chains, and assigning 

is particularly desirable for smaller proteins, although the ^ classification based on a subjective evaluation of one 

present methods allow the design of larger proteins as well. 30 skilled in the art of protein modelling. Alternatively, a 

While there is no theoretical limit to the length of the protein preferred embodiment utilizes an assessment of the orien- 

which may be designed this way, there is a practical com- tation of the Ca-Cp vectors relative to a solvent accessible 

putational limit. surface computed using only the template Ca atoms, as 

In an alternate preferred embodiment, only some of the outUned in U.S. Ser. Nos. 60/061,097, 60/043,464, 60/054, 

residue positions of the protein are variable, and the remain- 35 ^78, 09/127,926 60/104,612, 60/158,700, 09/419,351, 

der are "fixed", that is, they are identified in the three 60/181,630, 60/186,904, U.S patent application, entitled 

dimensional structure as being in a set conformation. In Protein Design Automation For Protein Libraries (Filed: 

some embodiments, a fixed position is left in its original ^P^- ^4, 2000; Inventor: Bassil Dahiyat) and PCT US98/ 

conformation (which may or may not correlate to a specific 07254. Alternatively, a surface area calculation can be done, 

rotamer of the rotamer library being used). Alternatively, 40 Suitable core and boundary positions for IbA proteins are 

residues may be fixed as a non-wild type residue; for outlined below. 

example, when known site-directed mutagenesis techniques Once each variable position is classified as either core, 

have shown that a particular residue is desirable (for surface or boundary, a set of amino acid side chains, and thus 

example, to eliminate a proteolytic site or alter the substrate a set of rotamers, is assigned to each position. That is, the set 

specificity of an enzyme), the residue may be fixed as a 45 of possible amino acid side chains that the program will 

particular amino acid. Alternatively, the methods of the allow to be considered at any particular position is chosen, 

present invention may be used to evaluate mutations de Subsequently, once the possible amino acid side chains are 

novo, as is discussed below. In an alternate preferred chosen, the set of rotamers that will be evaluated at a 

embodiment, a fixed position may be "floated"; the amino particular position can be determined. Thus, a core residue 

acid at that position is fixed, but diflferent rotamers of that 50 will generally be selected from the group of hydrophobic 

amino acid are tested. In this embodiment, the variable residues consisting of alanine, valine, isoleucine, leucine, 

residues may be at least one, or anywhere from 0.1% to phenylalanine, tyrosine, tryptophan, and methionine (in 

99,9% of the total number of residues. Thus, for example, it some embodiments, when the a scaling factor of the van der 

may be possible to change only a few (or one) residues, or Waals scoring function, described below, is low, methionine 

most of the residues, with all possibilities in between. 55 is removed from the set), and the rotamer set for each core 

In a preferred embodiment, residues which can be fixed position potentially includes rotamers for these eight amino 

include, but are not limited to, structurally or biologicaUy acid side chains (all the rotamers if a backbone independent 

functional residues; alternatively, biologically functional library is used, and subsets if a rotamer dependent backbone 

residues may specifically not be fixed. For example, residues is used). SimQarly, surface positions are generally selected 

which are known to be important for biological activity, such 60 from the group of hydrophilic residues consisting of alanine, 

as the residues which the binding site for a binding partner serine, threonine, aspartic acid, asparagine, glutamine, 

(ligand/receplor, antigen/antibody, etc.), phosphorylation or glutamic acid, arginine, lysine and histidine. The rotamer set 

glycosylation sites which are crucial to biological function, for each surface position thus includes rotamers for these ten 

or structurally important residues, such as disulfide bridges, residues. Finally, boundary positions are generally chosen 

metal binding sites, critical hydrogen bonding residues, 65 from alanine, serine, threonine, aspartic acid, asparagine, 

residues critical for backbone conformation such as proline glutamine, glutamic acid, arginine, lysine histidine, valine, 

or glycine, residues critical for packing interactions, etc. isoleucine, leucine, phenylalanine, tyrosine, tryptophan, and 
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methionine. The rotanier set for each boundary position thus US98/07254, any combination of these scoring functions, 

potentially includes every rotamer for these seventeen resi- either alone or in combination, may be used. Once the 

dues (assuming cysteine, glycine and proline are not used, scoring functions to be used are identified for each variable 

although they can be). Additionally, in some preferred position, the preferred first step in the computational analy- 
embodimenls, a set of 18 naturally occuring amino acids (all 5 sis comprises the determination of the interaction of each 

except cysteine and proline, which are known to be particu- possible rotamer with all or part of the remainder of the 

larly disruptive) are used. protein. That is, the energy of interaction, as measured by 

Thus, as will be appreciated by those in the art, there is a one or more of the scoring functions, of each possible 

computational benefit to classifying the residue positions, as rotamer at each variable residue position with cither the 

it decreases the number of calculations. It should also be backbone or other rotamers, is calculated. In a preferred 

noted that there may be situations where the sets of core, embodiment, the interaction of each rotamer with the entire 

boundary and surface residues are altered from those remainder of the protein, i.e. both the entire template and all 

described above; for example, under some circumstances, o^her rotamers, is done. However, as outlined above, it is 

one or more amino acids is either added or subtracted from possible to only model a portion of a protein, for example a 
the set of allowed amino acids. For example, some proteins ^5 domain of a larger protein, and thus in some cases, not all of 
which dimerize or multmerize, or have ligand binding sites, protein need be considered. The term "portion", or 

may contain hydrophobic surface residues, etc. In addition, similar grammatical equivalents thereof, as used herein, with 

residues that do not allow helix "capping" or the favorable regard to a protein refers to a fragment of that protein. This 

interaction with an a-helix dipole may be subtracted from a fragment may range in size from 5-10 amino acid residues 
set of allowed residues. This modification of amino acid 20 to the entire amino acid sequence minus one amino acid, 

groups is done on a residue by residue basis. Accordingly, the term "portion", as used herein, with regard 

In a preferred embodiment, proline, cysteine and glycine ^ nucleic refers to a fragment of that nucleic acid. This 
are not included in the list of possible amino acid side fragment may range in size from 6-10 nucleotides to the 
chains, and thus the rotamers for these side chains are not ^^^""^ sequence minus one nucleotide, 
used. However, in a preferred embodiment, when the vari- * preferred embodiment, the first step of the compu- 
able residue position has a (() angle (that is, the dihedral angle tational processmg is done by calculating two sets of inter- 
defined by 1) the carbonyl carbon of the preceding amino ^^^^^^"^ ^^^^ rotamer at every position: the interaction of 
acid; 2) the nitrogen atom of the current residue; 3) the rotamer side chain with the template or backbone (the 
a-carbon of the current residue; and 4) the carbonyl carbon "singles" energy), and the interaction of the rotamer side 
of the current residue) greater than 0\ the position is set to ^^^^^ possible rotamers at every other position 
glycine to minimize backbone strain. "doubles" energy), whether that position is varied or 

r\„^^ fK« • • J f u floated. It should be understood that the backbone in this 

Once the group of potential rotamers IS assigned for each :„^u.a^^ u^*u *u * f *u * ■ . . 

variable residue position, processing proceedsls outlined in ^ f.h^"f ^ ^T'h ""T" 

U.S. Ser. No. 09/127. 926 and PCT US98/07254. TTiis „ Lht'^"!u "1 V '"Tfi . ? ' 

processing step entails analyzing interactions of the rotamers ^^^^ residues are defined as a particular con- 

*u u J -^u . • u 1 u formation of an amino acid, 

with each other and with the protein backbone to generate t.. « • i » / . , x ... 

r. e — i* n «i. • ^hus. Singles (rotamer/template) energies are calculated 

optunized protein sequences. Simplistically, the processine r *i- • * r i i ^^^y^y 

■ „ ■ c 1 • - tor the interaction of every possible rotamer at everv vari- 

initially comprises the use of a number of scormg functions .1 ... * u u lu ^ i ai v an 

tocaIc;iateenergiesofinteractionsoflherotame^,eitherto ^ 1^ :f^l"!C?rr 'if f T'^^^ 

the backbone itselfor other rotamers. Preferred PoXscoring « ^ .^""^""g f^"'^"'"^' 'T^"^- f"' 'h« Mrogen bonding 

functions include, but are not limited to. a Van der Waal! ^""^ function, every hydrogen bonding atom of the 

potential scoring function, a hydrogen bond potential scor- ' and even, hydrogen bonding atom of the backbone 

♦• iI -rL - IS evaluated, and the E^o is ca culaled for each possible 

ing function, an atomic solvation scoring function, a sec- , ' • ui o- i i r l 

^ A , I • t J 1 rotamer at every variable position. Similarly, for the van der 

ondary structure propensity scoring function and an electro- wi i • f * c .i. . • 

. f A • t J u ji_ 1 .1 Waals scoring function, every atom of the rotamer is com- 

sta ic sconng function. As ^further described below a least „ ^^^f „f '^^ , J .^^^ ^ 

one sconng fiinc .on is used to score each position, although g^^^^one atoms of its own residue) and the E.,^ is calcu- 

the s«,nng functions may d^er depending on the position .^^^^ ^^^^ j^,^ J' ^^^^^^ 

classification or other considerations, like favorable inter- ^ ... y a/*- n j «/ i 

... u r J- I A jLi .L. I position. In addition, generally no van der Waals energy is 

action with an a- helix dipole. As outlmed below, the total i i . j f *u * * ^ i_ .1. i. ^ 1 

.... J • /l 11.' • . 50 calculated II the atoms are connected by three bonds or less, 

energy which IS used in the calculations is the sum of the n *i, * • i *• • t *• r r t. 

r. ' c J. ^.1 •• For the atomic solvation scormg function, the surface of the 

energy of each sconng function used at a particular position, • . • * *t, _r r,. . 1. 1^ 

■ ^ 11 u • c . - 1 rotamer IS measured against the surface of the template, and 

as IS generally shown m Equation 1: .u n r u ti . . • u, • 1 

^ Ihe b„ lor each possible rotamer at every variable residue 

£,^,r«^^.+«f«+n^*-6a«/,«,+«£'«+n^./« Equation 1 position is calculated, llie secondary structure propensity 

55 scoring function is also considered as a singles energy, and 

In Equation 1, the total energy is the sum of the energy of thus the total singles energy may contain an E^^ term. As will 

the van der Waals potential (E^^), the energy of atomic be appreciated by those in the art, many of these energy 

solvation(E„), the energy of hydrogen bonding (E;,.i,^^,.„^, terms will be close to zero, depending on the physical 

the energy of secondary structure (EJ) and the energy of distance between the rotamer and the template position; that 
electrostatic interaction (E^/^^). The term n is either 0 or 1, 60 is, the farther apart the two moieties, the lower the energy, 
depending on whether the term is to be considered for the For the calculation of "doubles" energy (rotamer/ 

particular residue position. rotamer), the interaction energy of each possible rotamer is 

As outlined in U.S. Ser. Nos. 60/061,097, 60/043,464, compared with every possible rotamer at all other variable 

60/054,678, 09/127,926, 60/104,612, 60/158,700, 09/419, residue positions. Thus, "doubles" energies are calculated 
351, 60/181,630, 60/186,904, U.S patent application, 65 for the interaction of every possible rotamer at every vari- 

entitled Protein Design Automation For Protein Libraries able residue position with every possible rotamer at every 

(Filed: Apr. 14, 2000; Inventor: Bassil Dahiyat) and PCT other variable residue position, using some or all of the 



us 6,514,729 Bl 
19 20 

scoring functions. Thus, for the hydrogen bonding scoring outlined above may be biased or weighted in a variety of 

function, every hydrogen bonding atom of the first rotamer ways. For example, a bias towards or away from a reference 

and every hydrogen bonding atom of every possible second sequence or family of sequences can be done; for example, 

rotamer is evaluated, and the E^^ is calculated for each a bias towards wild-type or homolog residues may be used, 

possible rotamer pair for any two variable positions. 5 Similarly, the entire protein or a fragment of it may be 

Similarly, for the van der Waals scoring function, every atom biased; for example, the active site may be biased towards 

of the first rotamer is compared to every atom of every wild-type residues, or domain residues towards a particular 

possible second rotamer, and the E^^H^is calculated for each ^^^^^^ physical property can be done. Furthermore, a bias 

possible rotamer pair at every two variable residue positions. towards or against increased energy can be generated. Addi- 

For the atomic solvation scoring function, the surface of the lo t^onal scoring function biases include, but are not limited to 

first rotamer is measured against the surface of every pos- 'PP^y^°f electrostatic potential gradient or hydrophobicity 

sible second rotamer, and the for each possible rotamer ^'fTr' ' '1 ? binding partner to the 

pair at every two variable residue positions is calculated. JobkT' '"^ ' ^ ^ 

^lie secondary structure propensity scoring function need i„ ^^^^^^^ alternative embodiment, there are a 

not be run as a doubles energy, as it is considered as a is variety of additional scoring functions that may be used, 

component of the "smgles" energy. As will be appreciated Additional scoring functions include, but are not limited to 

by those m the art, many of these double energy terms will torsional potentials, or residue pair potentials, or residue 

be close to zero, depending on the physical distance between entropy potentials. Such additional scoring functions can be 

the first rotamer and the second rotamer; that is, the farther used alone, or as functions for processing the library after it 

apart the two moieties, the lower the energy. 20 is scored initially. For example, a variety of functions 

In addition, as will be appreciated by those in the art, a derived from data on binding of peptides to MHC (Major 

variety of force fields that can be used in the PDA calcula- Histocompabbility Complex) can be used to rescore a library 

tions can be used, including, but not limited to, Dreiding I in order to eliminate proteins containing sequences which 

and Dreiding 11 [Mayo et at, J. Phys. Chem. 94:8897 can potentially bind to MHC, i.e. potentially immunogenic 

(1990)], AMBER [Weiner et al., J. Amer, Chem. Soc. 25 sequences. 

106:765 (1984) and Weiner et al., J. Comp. Chem. 106:230 In a preferred embodiment, a variety of filtering tech- 

(1986)], MM2 [Allinger, J. Chem. Soc. 99:8127 (1977), niques can be done, including, but not limited to, DEE and 

Liljefors et al., J. Com. Chem. 8:1051 (1987)]; MMP2 its related counterparts. Additional filtering techniques 

[Sprague et al., J. Comp. Chem. 8:581 (1987)]; CHARMM include, but are not limited to branch -a nd-bound techniques 

[Brooks et al., J. Comp. Chem. 106:187 (1983)]; GROMOS; 30 for finding optimal sequences (Gordon and Mayo, Structure 

and MM3 [Allinger et al., J. Amer. Chem. Soc. 111:8551 Fold. Des. 7:1089-98, 1999), and exhaustive enumeration of 

(1989)], OPLS-AA [Jorgensen et al., J. Am. Chem, Soc. sequences. 

118:11225-11236(1996); Jorgensen,W.L.; BOSS, Version As will be appreciated by those in the art, once an 

4.1; Yale University: New Haven, Conn. (1999)]; OPLS optimized sequence or set of sequences is generated, a 

[Jorgensen et al, J. Am. Chem. Soc.ll0:1657ff (1988); 35 variety of sequence space sampling methods can be done, 

Jorgensen et al., J Am. Chem. Soc. 112:4768ff (1990)]; either in addition to the preferred Monte Carlo methods, or 

UNRES (United Residue Forcefield; Liwo et al,. Protein instead of a Monte Carlo search. That is, once a sequence or 

Science 2:1697-1714 (1993); Liwo et al., Protein Science set of sequences is generated, preferred methods utilize 

2:1715-1731 (1993); Liwo et al., J. Comp. Chem. sampling techniques to allow the generation of additional, 

18:849-873 (1997); Liwo et al., J. Comp. Chem. 40 related sequences for testing. 

18:874-884 (1997); Liwo et al., J, Comp. Chem. These sampling methods can include the use of amino 

19:259-276 (1998); Forcefield for Protein Structure Predic- acid substitutions, insertions or deletioas, or recombinations 

tion (Liwo et al., Proc. Natl. Acad. Sci. U.S. A 96:5482-5485 of one or more sequences. As outlined herein, a preferred 

(1999)]; ECEPP/3 [Liwo et al., J Protein Chem. 13(4) embodiment utilizes a Monte Carlo search, which is a series 

:375^0 (1994)]; A field (Weiner, el al., J, Am. Chem. Soc. 45 of biased, systematic, or random jumps. However, there are 

106:765-784); AMBER 3.0 force field (U.C. Singh el al, other sampling techniques that can be used, including Boll- 

Proc. Natl. Acad. Sci, U.S.A. 82:755-759); CHARMM and zman sampling, genetic algorithm techniques and simulated 

CHARMM22 (Brooks et al., J. Comp. Chem. 4:187-217); annealing. In addition, for all the sampHng techniques, the 

cvfiQ.O [Dauber-Osguthorpe, et al., Proteins: Structure, kinds of jumps allowed can be altered (e.g. random jumps to 

Function and Genetics, 4:31-47 (1988)]; cff91 (Maple, et so random residues, biased jumps (to or away from wild-type, 

al., J. Comp, Chem. 15:162-182); also, the DISCOVER for example), jumps to biased residues (to or away from 

(cvff and cfiP91) and AMBER forcefields are used in the similar residues, for example, etc.). Jumps where multiple 

INvSIGHl' molecular modeling package (Biosym/MvSI, San residue positions are coupled (two residues always change 

Diego Calif.) and HARMM is used in the QUANTA together, or never change together), jumps where whole sets 

molecular modeling package (Biosym/MSI, San Diego 55 of residues change to other sequences (e.g., recombination). 

Calif.), all ofwhich are expressly incorporated by reference. Similarly, for all the sampling techniques, the acceptance 

Once the singles and doubles energies are calculated and criteria of whether a sampling jump is accepted can be 

stored, the next step of the computational processing may altered. 

occur. As outlined in U.S. Scr. No. 09/127,926 and PCT In addition, it should be noted that the preferred methods 

US98/07254, preferred embodiments utilize a Dead End 60 of the invention result in a rank ordered list of sequences; 

Elimination (DEE) step, and preferably a Monte Carlo step. that is, the sequences are ranked on the basis of some 

PDA, viewed broadly, has three components that may be objective criteria. However, as outlined herein, it is possible 

varied to alter the output (e.g. the primary library): the to create a set of non-ordered sequences, for example by 

scoring functions used in the process; the filtering technique, generating a probability table directly (for example using 

and the sampling technique. 65 SCMF analysis or sequence alignment techniques) that lists 

In a preferred embodiment, the scoring functions may be sequences without ranking them. ITie sampling techniques 

altered. In a preferred embodiment, the scoring functions outlined herein can be used in either situation. 
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In a preferred embodiment, Boltzman sampling is done. 
As will be appreciated by those in the art, the temperature 
criteria for Boltzman sampling can be altered to allow broad 
searches at high temperature and narrow searches close to 
local optima at low temperatures (see e.g.. Metropolis et ah, 
J. Chem. Phys. 21:1087, 1953). 

In a preferred embodiment, the sampling technique uti- 
lizes genetic algorithms, e.g., such as those described by 
Holland (Adaptation in Natural and Artifical Systems, 1975, 
Ann Arbor, U, Michigan Press). Genetic algorithm analysis 
generally takes generated sequences and recombines them 
computationally, similar to a nucleic acid recombination 
event, in a manner similar to "gene shuffling". ITius the 
"jumps" of genetic algorithm analysis generally are multiple 
position jumps. In addition, as outlined below, correlated 
multiple jumps may also be done. Such jumps can occur 
with different crossover positions and more than one recom- 
bination at a time, and can involve recombination of two or 
more sequences. Furthermore, deletions or insertions 
(random or biased) can be done. In addition, as outlined 
below, genetic algorithm analysis may also be used after the 
secondary library has been generated. 

In a preferred embodiment, the sampling technique uti- 
lizes simulated annealing, e.g., such as described by Kirk- 
patrick et al. [Science. 220:671-680 (1983)]. Simulated 
annealing alters the cutoff for accepting good or bad jumps 
by altering the temperature. That is, the stringency of the 
cutoff is altered by altering the temperature. This allows 
broad searches at high temperature to new areas of sequence 
space, altering with narrow searches at low temperature to 
explore regions in detail. 

In addition, as outlined below, these sampling methods 
can be used to further process a first set to generate addi- 
tional sets of IbA proteins. 

The computational processing results in a set of optimized 
IbA protein sequences. These optimized IbA protein 
sequences are generally significantly different from the 
wild-type IFN-^ sequence from which the backbone was 
taken. ITiat is, each optimized IbA protein sequence prefer- 
ably comprises at least about 3-10% variant amino acids 40 
from the starting or wild type sequence, with at least about 
10-15% being preferred, with at least about 15-20% 
changes being more preferred and at least 25% being 
particularly preferred. 

In a preferred embodiment, the IbA proteins of the inven- 
tion have 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39, or 40 different residues from the 
human IFN-p sequence. 

Thus, in the broadest sense, the present invention is 
directed to IbA proteins that have IFN-p activity. By "IFN-p 
activity"* or "IbA" herein is meant that the IbA protein 
exhibits at least one, and preferably more, of the biological 
functions of an IFN-p, as defined below. In one embodiment, 
the biological function of an IbA protein is altered, prefer- 
ably improved, over the corresponding biological activity of 
an IFN-p. 

By "protein" herein is meant at least two covalently 
attached amino acids, which includes proteins, polypeptides, 
oHgopep tides and peptides. The protein may be made up of 
naturally occurring amino acids and peptide bonds, or syn- 
thetic peptidomimetic structures, i.e., "analogs" such as 
peptoids [see Simon et al., Proc. Natl. Acd. Sci. U.S.A. 
89(20:9367-71 (1992)], generally depending on the method 
of synthesis. Thus "amino acid", or "peptide residue", as 
used herein means both naturally occurring and synthetic 
amino acids. For example, homo-phenylalanine, citrulline. 
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and noreleucine are considered amino acids for the purposes 
of the invention. "Amino acid" also includes amino acid 
residues such as proline and hydroxyproline. In addition, 
any amino acid representing a component of the IbA pro- 
teins can be replaced by the same amino acid but of the 
opposite chirality. Thus, any amino acid naturally occurring 
in the L-conflguration (which may also be referred to as the 
R or S, depending upon the structure of the chemical entity) 
may be replaced with an amino acid of the same chemical 
structural type, but of the opposite chirality, generally 
referred to as the D- amino acid but which can additionally 
be referred to as the R- or the S-, depending upon its 
composition and chemical configuration. Such derivatives 
have the property of greatly increased stability, and therefore 
are advantageous in the formulation of compounds which 
may have longer in vivo half lives, when administered by 
oral, intravenous, intramuscular, intraperitoneal, topical, 
rectal, intraocular, or other routes. In the preferred 
embodiment, the amino acids arc in the (S) or 
L-configuration. If non-naturally occurring side chains are 
used, non-amino acid substituents may be used, for example 
to prevent or retard in vivo degradations. Proteins including 
non-naturally occurring amino acids may be synthesized or 
in some cases, made recombinantly; see van Hest et al., 
FEBS Left 428:(l-2) 68-70 May 22, 1998 and Tang et al., 
Abstr. Pap Am. Chem. S218:U138-U138 Part 2 Aug. 22, 
1999, both of which are expressly incorporated by reference 
herein. 

Additionally, modified amino acids or chemical deriva- 
tives of amino acids of consensus or fragments of IbA 
proteins, according to the present invention may be 
provided, which polypeptides contain additional chemical 
moieties or modified amino acids not normally a part of the 
protein. Covalent and non-covalent modifications of the 
protein are thus included within the scope of the present 
invention. 

Such modifications may be introduced into an IbA 
polypeptide by reacting targeted amino acid residues of the 
polypeptide with an organic derivatizing agent that is 
capable of reacting with selected side chains or terminal 
residues. The following examples of chemical derivatives 
are provided by way of illustration and not by way of 
hmitation. 

Aromatic amino acids may be replaced with D- or 
L-naphylalanine, D- or L-Phcnylglycine, D- or L-2- 
thieneylalanine, D- or L-1-, 2-, 3- or 4-pyreneylalanine, D- 
or L-3-thieneylalanine, D- or L-(2-pyridinyl)-alanine, D- or 
L-(3-pyridinyl)-alanine, D- or L-(2-pyrazinyl)-alanine, D- 
or L-(4-isopropyl)-phenylglycine, D-(trifluoromethyl)- 
phenylglycine, D-(trifluoromethyl)-phenylalanine, D-p- 
fluorophenylalanine, D- or L-p-biphenylphenylalanine, D- 
or L-p-methoxybiphenylphenylalanine, D- or L-2-indole 
(alkyl)alanines, and D- or L-alkylainines where alkyl may be 
substituted or unsubstituted methyl, ethyl, propyl, hexyl, 
butyl, pentyl, isopropyl, iso-butyl, sec-isotyl, iso-pentyl, 
non-acidic amino acids, of C1-C20. 

Acidic amino acids can be substituted with non- 
carboxylate amino acids while maintaining a negative 
charge, and derivatives or analogs thereof, such as the 
non-limiting examples of (phosphono)alamne, (phosphono) 
glycine, (phosphono)leucine, (phosphono)isoleucine, 
(phosphono)threonine, or (phosphono)senne; or sulfated 
(e.g., — SO3H) threonine, serine, tyrosine. 

Other substitutions may include unnatural hyroxylated 
amino acids that may be made by combining "alkyl" with 
any natural amino acid, llie term "alkyl" as used herein 
refers to a branched or unbranched saturated hydrocarbon 
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group of 1 to 24 carbon atoms, such as methyl, ethyl, 
n-propyl, isoptopyl, n-butyl, isobutyl, t-butyl, octyl, decyl, 
tetradecyl, hexadecyl, eicosyl, tetracisyl and the like. Pre- 
ferred alkyl groups herein contain 1 to 12 carbon atoms. 
Also included within the definition of an alkyl group are 
cycloalkyl groups such as C5 and C6 rings, and heterocyclic 
rings with nitrogen, oxygen, sulfur or phosphorus. Alkyl 
also includes heteroalkyl, with heteroatoms of sulfur, 
oxygen, and nitrogen being preferred. Alkyl includes sub- 
stituted alkyl groups. By "substituted alkyl group" herein is 
meant an alkyl group further comprising one or more 
substitution moieties. A preferred heteroalkyl group is an 
alkyl amine. By "alkyl amine" or grammatical equivalents 
herein is meant an alkyl group as defined above, substituted 
with an amine group at any position. In addition, the alkyl 
amine may have other substitution groups, as outlined above 
for alkyl group. The amine may be primary ( — NHjR), 
secondary ( — NHR), or tertiary ( — NR3). Basic amino acids 
may be substituted with alkyl groups at any position of the 
naturally occurring amino acids lysine, arginine, ornithine, 
citmlline, or (guanidino) -acetic acid, or other (guanidino) 
alkyl-acetic acids, where "alkyl" is define as above. Nitrile 
derivatives (e.g., containing the CN-moiety in place of 
COOH) may also be substituted for asparagine or glutamine, 
and methionine sulfoxide may be substituted for methionine. 
Methods of preparation of such peptide derivatives are well 
known to one skilled in the art. 

In addition, any amide linkage in any of the IbA polypep- 
tides can be replaced by a ketomethylene moiety. Such 
derivatives are expected to have the property of increased 
stability to degradation by enzymes, and therefore possess 
advantages for the formulation of compounds which may 
have increased in vivo half lives, as administered by oral, 
intravenous, intramuscular, intraperitoneal, topical, rectal, 
intraocular, or other routes. 

Additional amino acid modifications of amino acids of 
IbA polypeptides of the present invention may include the 
following: Cysteinyl residues may be reacted with alpha- 
haloacelates (and corresponding amines), such as 
2-chloroacelic acid or chloroacetamide, to give carboxym- 
ethyl or carboxyamidomethyl derivatives. Cysteinyl resi- 
dues may also be derivatized by reaction with compounds 
such as bromotrifluoroacetone, alpha-bromo-beta-(5- 
imidozoyl)propionic acid, chloroacetyl phosphate, 
N-alkylmaleimides, 3-nitro-2-pyridyl disulfide, methyl 
2-pyridyl disulfide, p-chloromercuribenzoate, 
2-chloromercuri-4-nitrophenol, or chloro-7-nitrobenzo-2- 
oxa-l,3-diazole. 

Histidyl residues may be derivatzed by reaction with 
compounds such as diethylprocarbonate e.g., at pH 5.5-7.0 
because this agent is relatively specific for the histidyl side 
chain, and para-bromophenacyl bromide may also be used; 
e.g., where the reaction is preferably performed in O.IM 
sodium cacodylale at pH 6.0. 

Lysinyl and amino terminal residues may be reacted with 
compounds such as succinic or other carboxyHc acid anhy- 
drides. Derivatization with these agents is expected to have 
the effect of reversing the charge of the lysinyl residues. 
Other suitable reagents for derivatizing alpha-amino- 
containing residues include compounds such as imidoesters/ 
e.g., as methyl picolinimidate; pyridoxal phosphate; pyri- 
doxal; chloroborohydride; trinitrobenzenesulfonic acid; 
0-methylisourea; 2,4 penlanedione; and transaminase- 
catalyzed reaction with glyoxylate. 

Arginyl residues may be modified by reaction with one or 
several conventional reagents, among them phenylglyoxal, 
2,3-butanedione, 1,2-cyclohexanedione, and ninhydrin 
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according to known method steps. Derivatization of arginine 
residues requires that the reaction be performed in alkaline 
conditions because of the high pKa of the guanidine func- 
tional group. Furthermore, these reagents may react with the 
groups of lysine as well as the arginine epsilon-amino group. 

The specific modification of tyrosyl residues per se is 
well-known, such as for introducing spectral labels into 
tyrosyl residues by reaction with aromatic diazonium com- 
pounds or tetranitromethane. N-acelylimidizol and tetrani- 
tromethane may be used to form 0-acetyl tyrosyl species 
and 3-nitro derivatives, respectively. 

Carboxyl side groups (aspartyl or glutamyl) may be 
selectively modified by reaction with carbodiimides (R'-N- 
C-N-R') such as l-cyclohexyl-3-(2-morpholinyl-(4-elhyl) 
carbodiimide or l-ethyl-3-(4-azonia-4,4-dimethylpentyl) 
carbodiimide. Furthermore aspartyl and glutamyl residues 
may be converted to asparaginyl and glutaminyl residues by 
reaction with ammonium ions. 

Glutaminyl and asparaginyl residues may be frequently 
deamidated to the corresponding glutamyl and aspartyl 
residues. Altematively, these residues may be deamidated 
under mildly acidic conditions. Either form of these residues 
falls within the scope of the present invention. 

The IFN-p may be from any number of organisms, with 
IFN-p s from mammals being particularly preferred. Suit- 
able mammals include, but are not limited to, rodents (rats, 
mice, hamsters, guinea pigs, etc.), primates, farm animals 
(including sheep, goats, pigs, cows, horses, etc) and in the 
most preferred embodiment, from humans (this is some- 
limes referred to herein as hlFN-p, the sequence of which is 
depicted in FIG. 1), As will be appreciated by those in the 
art, IFN-P s based on IFN-p s from mammals other than 
humans may find use in animal models of human disease. 
The GenBank accession numbers for a variety of mamma- 
han IFN-p species is as follows: bovine 69689, 124465 
(IFN-P-1 precursor), 69688, 124467 (IFN-p-3 precursor), 
69687, 124466 (IFN-P-2 precursor); dog 442673; sheep 
310382; cat CAA69853. 1754718; pig 2411469, 164517; 
mouse 69686, 6754304, 51551, 124470, 494203; rat 
7438651, 2497434, 1616939; Macaca fascicularis 3766295; 
horse 69685, 124468, 164229; human 69684, 124469, 
4504603, 3318961, 3318960. 

The IbA proteins of the invention exhibit at least one 
biological function of an IFN-p. By "interferon-beta" or 
"IFN-P" herein is meant a wild type IFN-p or an allelic 
variant thereof. Thus, IFN-p refers to all forms of IFN-p that 
are active in accepted IFN-p assays. 

The IbA proteins of the invention exhibit at least one 
biological function of an IFN-p. By "biological function" or 
"biological property" herein is meant any one of the prop- 
erties or functions of an IFN-p, including, but not Hmited to, 
the ability to effect cellular growth, in particular inhibition 
of cell proliferation; the ability to effect cellular 
differentiation, in particular induction of cell differentiation; 
the ability to induce changes in cell morphology; the ability 
to modulate the immune system; the ability to enhance 
histocompafibility antigen expression; the ability to stimu- 
late immunoglobulin-Fc receptor expression on macroph- 
ages; the ability to induce antibody production in B 
lymphocytes, the ability to activate natural killer cells; the 
ability to bind to an IFN receptor; the ability to bind to a cell 
comprising an IFN receptor, the ability to treat multiple 
sclerosis; the ability to treat idiopathic puknonary fibrosis; 
the ability to treat inflammatory diseases; the ability to treat 
viral diseases, including treatment of infections caused by 
papilloma viruses, such as genital warts and condylomata of 
the uterine cervix; hepatitis viruses, such as acute/chronic 



us 6,514,729 Bl 

25 26 

hepatitis B and non-A, non-B hepatitis (hepatitis C); herpes increase) in the half life of the activity of an IbA protein 

viruses, such as herpes genitalis, herpes zoster, herpes when exposed to increasing or decreasing pH conditions as 

keratitis, and herpes simplex; viral encephalitis; cytomega- compared to that of IFN-p. Generally, alkaline stability is 

lovirus pneumonia; and prophylaxis of rhinovirus; the abil- measured by known procedures. 

ity to treat cancer, including treatment of several malignant 5 A change in thermal stability is evidenced by at least 

diseases such as osteosarcoma, basal cell carcinoma, cervi- about a 5% or greater increase or decrease (preferably 

cal dysplasia, glioma, acute myeloid leukemia, multiple increase) in the half life of the activity of an IbA protein 

myeloma, Hodgkin's disease, melanoma, renal cancer, liver when exposed to a relatively high temperature and neutral 

cancer, and breast cancer. pH as compared to that of IFN-p. Generally, thermal stabil- 

All of these IbA proteins will exhibit at least 50% of the lo ity is measured by known procedures, 
receptor binding or biological activity as the wild type Similarly, IbA proteins, for example are experimentally 
IFN-p. More preferred are IbA proteins that exhibit at least tested and validated in in vivo and in in vitro assays. Suitable 
75%, even more preferred are IbA proteins that exhibit at assays include, but are not limited to, e.g., examining their 
least 90%, and most preferred are IbA proteins that exhibit binding affinity to natural occurring or variant receptors and 
more than 100% of the receptor binding or biological is to high affinity agonists and/or antagonists. In addition to 
activity as the wild type IFN-p. Biological assays, receptor cell-free biochemical affinity tests, quantitative comparison 
binding assays, anti-viral and anti-proliferation assays are are made comparing kinetic and equilibrium binding con- 
described in U,S. patents 4,450,103; 4,518,584; 4,588,585; slants for the natural receptor to the naturally occurring 
4,737,462; 4,738,844; 4,738,845; 4,753,795; 4,769,233; IFN-p and to the IbA proteins. The kinetic association rate 
4,793,995; 4,914,033; 4,959,314; 5,183,746; 5,376,567; 20 (K,„) and dissociation rate (K,^), and the equilibrium bind- 
5,545,723; 5,730,969; 5,814,485; 5,869,603 and in e.g., ing constants (K^) can be determined using surface plasmon 
Anderson et al., J. Biol. Chem. 257(19): 11301^ (1982); resonance on a BlAcore instrument following the standard 
Herbermanetal., Nature 277(5693):221-3 (1979); Williams procedure in the literature [Pearce et al.. Biochemistry 
et al., Nature 282(5739):582-6 (1979); Branca and Baglioni, 38:81^9 (1999)]. Comparing the binding constant between 
Nature 294(5843):768-70 (1981); Proc. Natl. Acad. Sci. 25 a natural receptor and its corresponding naturally occurring 
U.S.A. 81{18):5662-6 (1984); Fellous et al., Proc. Natl. IFN-p with the binding constant of a natural occurring 
Acad. Sci. U.S.A. 79(10):3082-6 (1982); and Runkel et al., receptor and an IbA protein are made in order to evaluate the 
J. Biol. Chem. 273(14):8003-8 (1998), all of which are sensitivity and specificity of the IbA protein. Preferably, 
expressly incorporated by reference. binding affinity of the IbA protein to natural receptors and 

In one embodiment, at least one biological property of the 30 agonists increases relative to the naturally occurring IFN-P, 

IbA protein is altered when compared to the same property while antagonist affinity decreases. IbA proteins with higher 

of IFN-p. As outlined above, the invention provides IbA affinity to antagonists relative to the IFN-p may also be 

nucleic acids encoding IbA polypeptides. The IbA polypep- generated by the methods of the invention, 

tide preferably has at least one properly, which is substan- As described above, one biological function of an IbA 

tially different from the same property of the corresponding 35 protein is the ability of the IbA protein to bind to cells 

naturally occurring IFN-p polypeptide. The property of the comprising an interferon receptor. 

IbA polypeptide is the resuh the PDA analysis of the present In a preferred embodiment, the assay system used to 

invention. determine IbA is an in vitro system using cells that either 

The term "altered property" or grammatical equivalents express endogenous interferon receptors or cells stably 

thereof in the context of a polypeptide, as used herein, refer 40 transfected with the gene encoding the human interferon 

to any characteristic or attribute of a polypeptide that can be receptor. In this system, cell proliferation is measured as a 

selected or detected and compared to the corresponding function of BrdU incorporation, which is incorporated into 

property of a naturally occurring protein. These properties the nucleic acid of proliferating cells. A decrease above 

include, but arc not limited to oxidative stability, substrate background of at least about 10%, with at least about 20% 

specificity, substrate binding or catalytic activity, thermal 45 being preferred, with at least about 30% being more prc- 

stability, alkaline stability, pH activity profile, resistance to ferred and at least about 50%, 75% and 90% being espe- 

proteolytic degradation, Km, kcat, Km/kcat ratio, kinetic cially preferred is an indication of IbA. 

association (K^„) and dissociation (K^^) rate, protein In a preferred embodiment, the antigenic profile in the 

folding, inducing an immune response, ability to bind to a host animal of the IbA protein is similar, and preferably 

ligand, ability to bind to a receptor, ability to be secreted, 50 identical, to the antigenic profile of the host IFN-p; that is, 

ability to be displayed on the surface of a cell, ability to the IbA protein does not significantly stimulate the host 

oligomerize, ability to signal, ability to stimulate cell organism (e.g. the patient) to an immune response; that is, 

proliferation, ability to inhibit cell proliferation, ability to any immune response is not clinically relevant and there is 

induce apoptosis, ability to be modified by phosphorylabon no allergic response or neutralization of the protein by an 

or glycosylation, ability to treat disease. 55 antibody. That is, in a preferred embodiment, the IbA protein 

Unless otherwise specified, a substantial change in any of does not contain additional or different epitopes from the 

the above-listed properties, when comparing the property of IFN-p. By "epitope" or "determinant" herein is meant a 

an IbA polypeptide to the property of a naturally occurring portion of a protein which will generate and/or bind an 

IFN-P protein is preferably at least a 20%, more preferably, antibody. Thus, in most instances, no significant amount of 

50%, more preferably at least a 2-fold increase or decrease. 60 antibodies are generated to a IbA protein. In general, this is 

A change in oxidative stability is evidenced by at least accomplished by not significantly altering surface residues, 

about 20%, more preferably at least 50% increase of activity as outlined below nor by adding any amino acid residues on 

of an IbA protein when exposed to various oxidizing con- the surface which can become glycosylated, as novel gly- 

ditions as compared to that of IFN-p. Oxidative stability is cosylation can result in an immune response, 

measured by known procedures. 65 The IbA proteins and nucleic acids of the invention are 

A change in alkaline stability is evidenced by at least distinguishable from naturally occurring IFN-ps. By "natu- 

about a 5% or greater increase or decrease (preferably rally occurring" or "wild type" or grammatical equivalents, 
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herein is meant an amino acid sequence or a nucleotide 
sequence that is found in nature and includes allelic varia- 
tions; that is, an amino acid sequence or a nucleotide 
sequence that usually has not been intentionally modified. 
Accordingly, by "non-naturally occurring" or "synthetic" or 
"recombinant" or grammatical equivalents thereof, herein is 
meant an amino acid sequence or a nucleotide sequence that 
is not found in nature; that is, an amino acid sequence or a 
nucleotide sequence that usually has been intentionally 
modified. It is understood that once a recombinant nucleic 
acid is made and reintroduced into a host cell or organism, 
it will replicate non-recombinantly, i.e., using the in vivo 
cellular machinery of the host cell rather than in vitro 
manipulations, however, such nucleic acids, once produced 
recombinantly, although subsequently replicated non- 
recombinantly, are still considered recombinant for the pur- 
pose of the invention. Representative amino acid and nucle- 
otide sequences of a naturally occurring human IFN-P are 
shown in FIG. 1. It should be noted that unless otherwise 
stated, all positional numbering of IbA proteins and IbA 20 
nucleic acids is based on these sequences. That is, as will be 
appreciated by those in the art, an alignment of IFN-p 
proteins and IbA proteins can be done using standard 
programs, as is outlined below, with the identification of 
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Group, 575 Science Drive, Madison, Wis.), the Best Fit 
sequence program described by Devereux et aL, Nucl. Acid 
Res., 12:387-395 (1984), preferably using the default 
settings, or by inspection. Preferably, percent identity is 
5 calculated by FastDB based upon the following parameters: 
mismatch penalty of 1; gap penalty of 1; gap size penalty of 
0,33; and joining penalty of 30, "Current Methods in 
Sequence Comparison and Analysis," Macromolecule 
Sequencing and Synthesis, Selected Methods and 
10 Applications, pp 127-149 (1988), Alan R. Liss, Inc. 

An example of a useful algorithm is PILEUR PILEUP 
creates a multiple sequence alignment from a group of 
related .sequences using progres.sive, pairwise alignments. It 
can also plot a tree showing the clustering relationships used 
15 to create the alignment. PILEUP uses a simplification of the 
progressive alignment method of Feng & Dooliftle, J. Mol. 
Evol. 35:351-360 (1987); the method is similar to that 
described by Higgins & Sharp CABIOS 5:151-153 (1989). 
Useful PILEUP parameters including a default gap weight of 
3.00, a default gap length weight of 0.10, and weighted end 
gaps. 

Another example of a useful algorithm is the BLAST 
algorithm, described in: Altschul et al., J. Mol. Biol. 215, 
403-410, (1990); Altschul et al., Nucleic Acids Res. 



"equivalent" positions between the two proteins, llius, the 25 25:3389-3402 (1997); and Karlin et al., Proc. Natl. Acad. 



IbA proteins and nucleic acids of the invention are non- 
naturally occurring; that is, they do not exist in nature. 

Thus, in a preferred embodiment, the IbA protein has an 
amino acid sequence that differs from a wild-type IFN-p 
sequence by at least 3% of the residues. That is, the IbA 30 
proteins of the invention are less than about 97% identical to 
an IFN-p amino acid sequence. Accordingly, a protein is an 
"IbA protein" if the overall homology of the protein 
sequence to the amino acid sequence shown in FIG. lA or 



Sci. U.S.A. 90:5873-5787 (1993). A particularly useful 
BLAST program is the WU-BLAST-2 program which was 
obtained from Altschul et al.. Methods in Enzymology, 
266:460-480 (1996); http:flblast.wustl/edu/blastt 
README.html]. WU-BLAST-2 uses several search 
parameters, most of which are set to the default values. The 
adjustable parameters are set with the following values: 
overlap span=i, overlap fraction«=0.125, word threshold 
(T)=.ll. The HSP S and HSP S2 parameters are dynamic 



FIG. IB (SEQ ID N0:1) is preferably less than about 97%, 35 values and are established by the program itself depending 



more preferably less than about 95%, even more preferably 
less than about 90% and most preferably less than 85%. In 
some embodiments the homology will be as low as about 75 
to 80%. Stated dilTerently, based on the human IFN-P 
sequence of 166 residues (see FIG. lA) (SEQ ID NO; 1), IbA 40 
proteins have at least about 5 residues that differ from the 
human IFN-p sequence (3%), with IbA proteins having from 
5 residues to upwards of 62 residues being different from the 
human IFN-|3 sequence. Preferred IbA proteins have 5-30 
different residues with from about 5 to about 15 being 45 
particularly preferred (that is, 3-9% of the protein is not 
identical to human IFN-p). 

In another preferred embodiment, IbA proteins have 2, 3, 



upon the composition of the particular sequence and com- 
position of the particular database against which the 
sequence of interest is being searched; however, the values 
may be adjusted to increase sensitivity. 

An additional useful algorithm is gapped BLAST as 
reported by Altschul et al., Nucl. Acids Res., 25:3389-3402, 
Gapped BLAST uses BLOSUM-62 substitution scores; 
threshold T parameter set to 9; the two-hit method to trigger 
ungappcd extensions; charges gap lengths of k a cost of 
10+k; X„ set to 16, and set to 40 for database search stage 
and to 67 for the output stage of the algorithms. Gapped 
alignments are triggered by a score corresponding to -22 
bits. 

A % amino acid sequence identity value is determined by 



4, 5, 6. 7, 8, 9, 10, 11, 12, 13, 14, 15. 16, 17, 18, 19, 20, 21, 

22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 50 the number of matching identical residues divided by the 

38, 39, or 40 different residues from the human IFN-p total number of residues of the "longer" sequence in the 

sequence. aligned region. The "longer" sequence is the one having the 

Homology in this context means sequence similarity or most actual residues in the aligned region (gaps introduced 

identity, with identity being preferred. As is known in the art, by WU-Blast-2 to maximize the alignment score are 

a number of different programs can be used to identify 55 ignored). 

whether a protein (or nucleic acid as discussed below) has In a similar manner, "percent (%) nucleic acid sequence 

sequence identity or similarity to a known sequence. identity*' with respect to the coding sequence of the polypep- 

Sequence identity and/or similarity is determined using tides identified herein is defined as the percentage of nucle 



standard techniques known in the art, including, but not 
limited to, the local sequence identity algorithm of Smith & 60 
Waterman, Adv. Appl. Math., 2:482 (1981), by the sequence 
identity alignment algorithm of Needleman & Wunsch, J. 
Mol. Biol., 48:443 (1970), by the search for similarity 
method of Pearson & Lipman, Proc. Natl. Acad. Sci. U.S.A., 
85:2444 (1988), by computerized implementations of these 65 
algorithms (GAP, BESTFIT, FASTA. and TFASTA in the 
Wisconsin Genetics Software Package, Genetics Computer 



otidc residues in a candidate sequence that are identical with 
the nucleofide residues in the coding sequence of the cell 
cycle protein. A preferred method utilizes the BLASTN 
module of WU-BLAST-2 set to the default parameters, with 
overlap span and overiap fraction set to 1 and 0.125, 
respectively. 

The alignment may include the introduction of gaps in the 
sequences to be aligned. In addition, for sequences which 
contain either more or fewer amino acids than the protein 
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encoded by the sequence of FIG. 1, it is understood that in 21, 38, 50, 55, 56, 58, 59, 61, 62, 63, 66, 69, 70, 72, 74, 76, 

one embodiment, the percentage of sequence identity will be 77, 81, 84, 87, 90, 91, 94, 95, 98, 102, 114, 115, 118, 122, 

determined based on the number of identical amino acids in 125, 126, 129, 130, 132, 133, 136, 138, 139 142 143 144 

relation to the total number of amino acids. Thus, for 146, 147^ 150, 151, 153. 154, 157, 159, 160, 161, 163 and 

example, sequence identity of sequences shorter than that 5 154 (sec FIG. 3). Accordingly, in a preferred embodiment, 
shown in FIG^l, as discussed below, will be determmed proteins have variable positions selected from these 

using the number of amino acids m the shorter sequence, in positions 

one embodiment. In percent identity calculations relative i„ , ««fu„-^ ♦ tua * • u • ui 

weight is not assigned to various mantfestations of sequence J".' ^1^^. 1 * ^'h ' "'"I^m 

variation, such as, insertions, deletions, substitutions, etc. ^1 . ^ ''''' '^f^'^T. ""l ^'u ""'^ ^l^' 

In one embodiment, only identities arc scored positively Alternatively, at least a majority (51%) of the variable 

(+1) and all forms of sequence variation including gaps are P°^^^^?l^'^ ^^^f ^""^ residues, with at least about 

assigned a value of "0", which obviates the need for a 75% of the variable positions being preferably selected from 

weighted scale or parameters as described below for core residue positions, and at least about 90% of the variable 

sequence similarity calculations. Percent sequence identity positions being particularly preferred, A specifically pre- 

can be calculated, for example, by dividing the number of ferred embodiment has only core variable positions altered 
matching identical residues by the total number of residues compared to human IFN-p. 

of the "shorter" sequence in the aligned region and multi- Particularly preferred embodiments where IbA proteins 

plying by 100. The "longer" sequence is the one having the have variable core positions as compared to human IFN-p 

most actual residues in the aligned region. are shown in the Figures. 

Thus, IbAproteins of the present invention may be shorter 20 In one embodiment, the variable core positions are altered 

or longer than the amino acid sequence shown in FIG. lA to any of the other 19 amino acids. In a preferred 

(SEQ ID NO: 1). Thus, in a preferred embodiment, included embodiment, the variable core residues are chosen from Ala, 

within the definition of IbA proteins are portions or frag- Val, Phe, He, Leu, Tyr, Trp and Met. In another preferred 

ments of the sequences depicted herein. Fragments of IbA embodiment, the variable core residues are chosen from Ala, 

proteins are considered IbA proteins if a) they share at least 25 Val, Leu, lie, Phe, Tyr, and Trp. In another preferred 

one antigenic epitope; b) have at least the indicated homol- embodiment, the variable core residues are chosen from Ala, 

ogy; c) and preferably have IbA biological activity as Val, leu, He, and Gly. In another preferred embodiment, the 

defined herein. variable core residues are chosen from Ala, Gly, Ser, Thr, 

In a preferred embodiment, as is more fully outlined GIu, Asp, Gin, Asn, and Cys. 
below, the IbA proteins include further amino acid 30 In a preferred embodiment, the IbA protein of the inven- 

variations, as compared to a wild type IFN-p, than those tion has a sequence that differs from a wild-type human 

outlined herein. In addition, as outlined herein, any of the IFN-p protein in at least one amino acid position selected 

variations depicted herein may be combined in any way to from positions 6, 13, 17, 21, 56, 30 59, 61, 62, 63, 66, 69, 

form additional novel IbA proteins. 84, 87, 91, 98, 102, 114, 118, 122, 129, 146, 150, 154, 157, 

In addition, IbAproteins can be made that are longer than 35 160, and 161; see also FIG. 3, which outlines sets of amino 

those depicted in the figures, for example, by the addition of acid positions. 

epitope or purification tags, as outlined herein, the addition Preferred amino acids for each position, including the 

of other fusion sequences, etc. For example, the IbAproteins human IFN-P residues, are shown in FIGS. 4-16 (SEQ ID 

of the invention may be fused to other therapeutic proteins NOS:4-24). Thus, for example, for the A-chain of an IbA 

such as IL-11 or to other proteins such as Fc or serum 40 protein, at position 13, preferred amino acids are Phe, Tyr, 

albumin for pharmacokinetic purposes. See for example Glu, and Ala; at position 17, a preferred amino acid is Asp; 

U.S. Pat. No. 5,766,883 and 5,876,969, both of which are at position 69, a preferred amino acid is Val; at position 84 

expressly incorporated by reference. a preferred amino acid is lie; at position 87, a preferred 

In a preferred embodiment, the IbA proteins comprise amino acid is Phe; at position 91, a preferred amino acid is 

variable residues in core residues. 45 lie; at position 98, a preferred amino acid is Phe; at position 

Human IFN-p core residues are as follows: positions 1,6, 118, preferred amino acids are Ala, Val, and Cys; at position 

10, 13, 14, 15, 17, 18, 21, 38, 50, 55, 56, 58, 59, 61, 62, 63, 122, prefened amino acids are He and Val; at position 146, 

66, 69, 70, 72, 74, 76, 77, 81, 84, 87, 90, 91, 94, 95, 98, 102, a preferred amino acid is He; at position 157, a preferred 

114, 115, 118, 122, 125, 126, 129, 130, 132, 133, 136, 138, amino acid is Leu; and at position 161, preferred amino 

139, 142, 143, 144, 146, 147, 150, 151, 153, 154, 157, 159, 50 acids are Ala and Cys. 

160, 161, 163, and 164 (see FIG. 3). Accordingly, in a For the B-chain of an IbA protein, at positionl3, preferred 

preferred embodiment, IbA proteins have variable positions amino acids are Leu and Glu; at position 17, preferred amino 

selected from these positions. acid are Ala and Thr; at position 56, a preferred amino acid 

1lie structure of human IFN-p as reported by Karpasuset is I^eu; at position 63, a preferred amino acid is Phe; at 

al. (supra) indicated that IFN-P forms a dimer consisting of 55 position 84 a preferred amino acid is he; at position 87, a 

an A-chain and a B-chain. preferred amino acid is Phe; at position 91, a preferred 

Thus, in one embodiment, variable residues for the amino acid is He; at position 114, preferred amino acids are 

A-chain are as follows; positions 1, 6, 10, 13, 14, 17, 18, 21, Phe and Leu; at position 118, preferred amino acids arc Leu 

38, 50, 55, 56, 58, 59, 61, 62, 63, 66, 69, 70, 72, 74, 76, 77, and Glu; at position 122, preferred amino acids are He and 

81, 84, 87, 90, 91, 94, 95, 98, 102, 114, 115, 118, 122, 125. 60 Phe; and at position 161, preferred amino acids are Ala and 

126, 129, 130, 132, 133, 136, 138, 139, 142, 143, 144, 146, Glu. Preferred changes are as follows: L6A; L6F; S13F; 

147, 150, 151, 153, 154, 157, 159, 160, 161, 163, and 164 S13Y; S13L; S131; S13A; S13G; S13G; S13T; S13C; S13E: 

(see FIG. 3). Accordingly, in a preferred embodiment, IbA C17A; C17L; C17V; C17D; C17T; C171; C17E; C17S: 

proteins have variable positions selected from these posi- C17G; L211; L21V; L21A; L21Y; L21F; A56L; 159V; 

lions. 65 159A; 159L; M621; M62V; M62L; L63A; L63F; L63Y; 

'Ilius, in another embodiment, variable residues for the I66L; I66V; I66A; I69V; I69L; I69A; V84I; V84L; V84A 

B-chainareasfollows:positionsl,6, 10, 13, 14, 15, 17, 18, L87F; L87I; I^TY; \J&1V; L87A; L87W; V91I; V91A; 
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V91L; V91F; V91Y; V98A; L98F; G114F; G114L; S118A; 
S118V; SU8C; S118L; S118E; L122I; L122V; L122A; 
L122F; L122Y; L122W; I129V; I129L; I129A; V146I; 
V146A; I150V; I150A; I150L; I150F; F154L; F154Y; 
F157V; I157V; I157L; I157A; L160I; L160V; L160A; 
L160F; L160Y; T115A; T161V; T161I; T161D; T161C; 
T161E; and T161G. These may be done either individually 
or in combination, with any combination being possible. 
However, as outlined herein, preferred embodiments utilize 
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I157L(see FIG. 9C) (SEQ ID N0:16)]; and [S13E, C17D, 
V84I, V9I, S118A, L122I, I157L, and T161A(see FIG. 9D) 
(SEQ ID N0:17)]. 

Particularly preferred sequences for the B-chain of an IbA 
protein are selected from the group consisting of: [V84I and 
L87F (FIG. lOB) (SEQ ID NO: 18)]; [A56L, L63F. V84I, 
L87F. V91I, and L122F (see FIG. IIB) (SEQ ID N0:19)]; 
[S13L, A56L, V84I, V91I, G114F, S118L, L122I, and (see 
FIG. 12B) (SEQ ID NO:20)]; [S13L, C17A, A56U V841, 



at least five, and preferably more, variable positions in each lO L87F, V91L, G114F, S118L, L122I, and T161E (see FIG. 



IbA protein. 

Particularly preferred sequences for IbA proteins are 
selected from the group consisting of: [V841 and I^7F 
(FIG. 4B and FIG. lOB) (SEQ ID NOS:4,18)]; [V84I, V91I, 
L98F, L122I, and I157L (see FIG. 5B) (SEQ ID N0:5)]; 
[S13F, I69V, V84I, V91I, L98F, S118A, L122I, V146I, 
I157L, and T161A(see HG. 6B) (SEQ ID N0:6)]; [S13Y, 
I69V, V84I, V91I, L98F, S118V, L122V, V146I, I157L, and 
T161A(see FIG. 6C) (SEQ ID N0:7)]; [S13F, V84I, V91I, 



13B) (SEQ ID N0:21)]; [SUE, A56L, V84I, V91I, G114L, 
S118E, and T161E (see FIG. 14B) (SEQ ID NO:22)]; 
[C17T, A56L, V841, V91I, G114L, SllSE, and '1161 E (see 
FIG 15B) (SEQ ID NO:23)]; and [C17T, A56L, V84I, V91I, 
15 S118E, and T161E (see FIG. 16B) (SEQ ID NO:24)]. 

In a preferred embodiment, the IbA proteins of the inven- 
tion are human IFN-p conformers. By "conformer" herein is 
meant a protein that has a protein backbone 3D structure that 
is virtually the same but has significant differences in the 
L98F,S118A,L122I,1157L,andT161A(seeFIG.6D)(SEQ 20 amino acid side chains. That is, the IbA proteins of the 
ID N0:8)]; [S13F, C17D, I69V, V84I, V91I, L98F, S118A, invention define a conformer set, wherein all of the proteins 
L122I, VI1461, I157L, and T161A (see FIG. 7B) (SEQ ID of the set share a backbone structure and yet have sequences 
N0:9)]; [S13Y, C17D, 169V, V841, V91I, L98F, S118V, that differ by at least 3-5%. The three dimensional backbone 
L122A, V146I, I157L, and T161A (see FIG. 7C) (SEQ ID structure of an IbA protein thus substantially corresponds to 
NO: 10)]; [S13F, C17D. V84I. V91I, L98F, S118A, L122I, 25 the three dimensional backbone structure of human IFN-p. 
I157L, and T161A (see FIG. 70) (SEQ ID N0:11)]; [S13E, "Backbone" in this context means the non-side chain atoms: 
C17D, V84I, V91I, SllSC, V146I, and T161C (see FIG. SB) the nitrogen, carbonyl carbon and oxygen, and the a-carbon 
(SEQ ID N0:12)]; [S13A, V84I, V91I, SllSC, V146I, and the hydrogens attached to the nitrogen and a-carbon. To 
I157L, and T161C (see FIG. 8C) (SEQ ID NO: 13)]; [S13E, be considered a conformer, a protein must have backbone 
C17D, V84I, V91I, S118C, and T161C (see FIG. 8D)(SEQ 30 atoms that are no more than 2 A from the human IFN-p 
ID N0:14)]; [S13E, C17D, 169V, V84I, V91I, S118A, structure, with no more than 1.5 A being preferred, and no 
L122I, V146I, I157L, and T161(see FIG. 9B) (SEQ ID more than 1 A being particularly preferred. In general, these 
N0:15)]; [S13E, C17D, V84I, V91I, S118A, V146I, and distances may be determined in two ways. In one 
1157L(see FIG. 9C) (SEQ ID N0:16)]; [S13E, C17D, V84I, embodiment, each potential conformer is crystallized and its 
V91I, S118A, L122I, I157L, and T161A(see FIG. 9D) (SEQ 35 three dimensional structure determined. Alternatively, as the 
ID N0:17)]; [A56L, L63F, V84I, L87F, V91I, and L122F former is quite tedious, the sequence of each potential 
(see FIG. UB) (SEQ ID N0:19)]; [S13L, A56L, V84I, conformer is run in the PDA program to determine whether 
V91I, G114F, S118L, L122I, and T161A (see FIG. 12B) it is a conformer. 

(SEQ ID NO:20)]; [S13L, C17A, A56L, V84I, L87F, V91L, IbA proteins may also be identified as being encoded by 

G114F, S118L, L122I, and T161E (see FIG. 13B) (SEQ ID 40 IbA nucleic acids. In the case of the nucleic acid, the overall 



N0:21)]; [S13E, A56L, V84I, V91I, G114L, S118E, and 
T161E (see FIG. 14B) (SEQ ID NO:22)]; [C17T, A56L, 
V84I, V84I, V91I, S118E, G114L, S118E, and T161E (see 
FIG. 15B) (SEQ ID NO:23)]; and [C17T, A56L, V84I, V91I, 
SU8E, and T161E (sec FIG. 16B) (SEQ ID NO:24)], 

Particularly preferred sequences for the A-chain of an IbA 
protein are selected from the group consisting of; [V84I and 
L87F (FIG. 4B) (SEQ ID N0:4)]; [V84I, V91I, L98F, 
L122I, and I157L (see FIG. 5B) (SEQ ID N0:5)]; [S13F, 
I69V, V84I, V91I, L98F, S118A, L122I, V146I, I157L, and 
T161A(see FIG. 6B)(SEQ ID N0:6)]; [S13Y, I69V, V84I, 
V91I, L98F, Sli8V, U22V, V146I, I157L, and T161A (see 
FIG. 6C) (SEQ ID N0:7)]; [S13F, V84I, V91I, L98F, 
S118A, L122I, I157L, and T161A (see FIG. 60) (SEQ ID 



homology of the nucleic acid sequence is commensurate 
with amino acid homology but takes into account the 
degeneracy in the genetic code and codon bias of different 
organisms. Accordingly, the nucleic acid sequence homol- 
45 ogy may be either lower or higher than that of the protein 
sequence, with lower homology being preferred. 

In a preferred embodiment, an IbA nucleic acid encodes 
an IbA protein. As will be appreciated by those in the art, due 
to the degeneracy of the genetic code, an extremely large 
50 number of nucleic acids may be made, all of which encode 
the IbA proteins of the present invention. Thus, having 
identified a particular amino acid sequence, those skilled in 
the art could make any number of different nucleic acids, by 
simply modifying the sequence of one or more codons in a 



N0:8)]; [S13F, C17D, I69V, V84I, V911, L98F, S118A, 55 way which does not change the amino acid sequence of the 
L122I, V146I, I157L, and T161A (see FIG. 7B) (SEQ ID 
N0:9)]; [S13Y, C17D, I69V, V84I, V91I, L98F, S118V, 
L122V, V146I, I157L, and T161A (see FIG. 7C) (SEQ ID 
NO: 10)]; [S13F, C17D, V84I, V91I, L98F, S118A, L122I, 
I157L, and T161A (see FIG. 7D) (SEQ ID N0:11)]; [S13E, 60 
C17D, V84I, V91I, S118C, V146I, and T161C (see FIG. 8B) 
(SEQ ID N0:12)]; [S13A, V84I, V91I, SllSC, V146I, 
I157L, and T161C (see FIG. 8C) (SEQ ID NO: 13)]; [S13E, 
C17D, V84I, V91I, SllSC, and T161C (see FIG. 8D) (SEQ 



IbA. 

In one embodiment, the nucleic acid homology is deter- 
mined through hybridization studies. Thus, for example, 
nucleic acids which hybridize under high stringency to the 
nucleic acid sequence shown in FIG. 1 or its complement 
and encode a IbA protein is considered an IbA gene. 

High stringency conditions are known in the art; see for 
example Maniatis et al.. Molecular Cloning: A Laboratory 
Manual, 2d Edition, 1989, and Short Protocols in Molecular 



ID N0:14)]; [S13E, C17D, I69V, V84I, V91I, S118A, 65 Biology, ed. Ausubel, et al., both of which are hereby 
L1221, V1461, I157L, and T161A (see FIG. 9B)(SEQ ID incorporated by reference. Stringent conditions are 
NO: 15)]; [S13E, C17D, V84I, V91I, S118A, V146I, and sequence-dependent and will be different in different cir- 
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cumstances. Longer sequences hybridize specifically at about 0.5%, more preferably at least about 5% by weight of 

highertemperatures. An extensive guide to the hybridization the total protein in a given sample. A substantially pure 

of nucleic acids is found in Tijssen, Techniques in Biochem- p^tein comprises at least about 75% by weight of the total 

istry and Motolar Biology-Hybridization with Nucleic ^-.^ .^out 80% being preferred, and at least 

Acid Probes, "Overview of prmciples of hybridization and 5 V , oncy u- c ^ . ^ . . 

the strategy of nucleic acid assays" (1993). Generally, ^^^^^^ ^0% being particularly preferred. The defimtion 

stringent conditions are selected to be about S-IO** C lower '^'^^"^^^ production of an IbA protein from one organism 

than the thermal melting point (T J for the specific sequence ^ different organism or host cell. Alternatively, the protein 

at a defined ionic strength and pH. The T^ is the temperature '"^de at a significantly higher concentration than is 

(under defined ionic strength, pH and nucleic acid lO normally seen, through the use of an inducible promoter or 

concentration) at which 50% of the probes complementary high expression promoter, such that the protein is made at 

to the target hybridize to the target sequence at equilibrium increased concentration levels. Furthermore, all of the IbA 

(as the target sequences are present in excess, at T,„, 50% of proteins outlined herein are in a form not normally found in 

the probes are occupied at equilibrium). Stringent conditions nature, as they contain amino acid substimtions, insertions 

^ t 1 n M 'V T'^u^'ll^c.^ ^''f n^'x? d^l^ti^"^' ^ith substimUons being preferred, as dis- 

about 1.0 M sodium ion, typically about 0.01 to 1.0 M cussed below 

sodium ion concentration (or other salts) at pH 7.0 to 8.3 and ^ 

the temperature is at least about 30** C. for short probes (e.g. included within the definition of IbA proteins of the 

10 to 50 nucleotides) and at least about 60° C. for long present invention are amino acid sequence variants of the 

probes (e.g. greater than 50 nucleotides). Stringent condi- 20 IbA sequences outlined herein and shown in the Figures, 

tionsmay also be achieved with the addition of destabilizing That is, the IbA proteins may contain additional variable 

agents such as formamide. positions as compared to human IFN-p. These variants fall 

In another embodiment, less stringent hybridization con- into one or more of three classes: substitutional, insertional 

ditioi^ are used; for example moderate or low stringency ^..^^^^^ T^ese variants ordinarily are prepared 

conditions may be used, as are known m the art; see Maniatis 25 k„ .if^ ™^:fi , • ^ i *j - .u t^m* 

and Ausubel, supra, and Tijssen, supra. ^ f ! "'^^'S'""'^' nucleotdes in the DNA 

The IbA proteins and nucleic acids of the present inven- ^n^^^ding an IbA protein, using cassette or PGR mutagenesis 
tion are recombinant. As used herein, "nucleic acid" may ^^^^^ techniques well known in the art, to produce DNA 
refer to either DNA or RNA, or molecules which contain encoding the variant, and thereafter expressing the DNA in 
both deoxy- and ribonucleotides. The nucleic acids include 30 recombinant cell culture as outlined above. However, vari- 
genomic DNA, cDNA and oligonucleotides including sense ant IbA protein fragments having up to about 100-150 
and an ti -sense nucleic acids. Such nucleic acids may also residues may be prepared by in vitro synthesis using estab- 
contain modifications in the ribose -phosphate backbone to fished techniques. Amino acid sequence variants are char- 
increase stabiUty and half life of such molecules in physi- acterized by the predetermined nature of the variation, a 
ological environments. , , , . , feature that sets them apart from naturally occurring allelic 

The nucleic acid may be double stranded, smgle stranded, • , • *• r .u iua . - • 

r.r ^ f u 41, J ui * J J 1 or interspecies variation of the IbA pro tern ammo acid 

or contam portions of both double stranded or smele ™ ....... 

stranded sequence. As will be appreciated by those in the art, ^^^^ce. The variants typically exhibit the same quahtative 

the depiction of a single strand ("Watson*0 also defines the biological activity as the naturally occurring analogue, 

sequence of the other strand ("Crick"); thus the sequence 40 although variants can also be selected which have modified 

depicted in FIG. 1 also includes the complement of the characteristics as will be more fully outlined below, 

sequence. By the term "recombinant nucleic acid" herein is While the site or region for introducing an amino acid 

meant nucleic acid, originally formed in vitro, in general, by sequence variation is predetermined, the mutation per se 

the mampujation of nucleic acidby endonucleases in a form n^ed not be predetermined. For example, in order to opti- 

not normally found in nature. Thus an isolated IbA nucleic 45 _X n . / • / 

acid, in a Unear form, or an expression vector formed in vitro "'^^ performance of a mutation at a given site, random 

by figating DNAmolecules that are not normally joined, are ^^'^^^^^^ °^ay be conducted at the target codon or region 

both considered recombinant for the purposes of this inven- expressed IbA variants screened for the optimal 

tion. It is understood that once a recombinant nucleic acid is combination of desired activity. Techniques for making 

made and reintroduced into a host cell or organism, it will 50 substitution mutations at predetermined sites in DNA having 

replicate non-recombinantly, i.e. using the in vivo cellular a known sequence are well known, for example, M13 primer 

machinery of the host cell rather than in vitro manipulations; mutagenesis and PGR mutagenesis. Screening of the 

however, such nucleic acids, once produced recombinantly, mutants is done using assays of IbA protein activities, 

ahhough subsequently replicated non-recombinantly, are Amino acid substitutions are typicaUy of single residues; 

still considered recombmant for the purposes of the inven- 55 insertions usually will be on the order of from about 1 to 20 

-1 1 « i_. .... . . amino acids, although considerably larger insertions may be 

Similarly, a recombmant protein is a protem made using ^^i^,^^^^^ Deletions range from about 1 to about 20 residues, 

recombinant techniques, i.e. through the expression of a oith«..rrK ^r..^ a i «■ u u i 

u- . 1 ■ J J ■ . J u * . ■ although in some cases deletions may be much lareer. 

recombinant nucleic acid as depicted above. A recombinant j b 

protein is distinguished from naturally occurring protein by 60 Substitutions, deletions, insertions or any combination 

at least one or more characteristics. For example, the protein thereof may be used to arrive at a final derivative. Generally 

may be isolated or purified away from some or all of the ^^^^ changes are done on a few amino acids to minimize the 

proteins and compounds with which it is normally associ- alteration of the molecule. However, larger changes may be 

ated in its wild type host, and thus may be substantially pure. tolerated in certain circumstances. When small alterations in 

For example, an isolated protein is unaccompanied by at 65 the characteristics of the IbA protein are desired, substitu- 

least some of the material with which it is normally asso- lions arc generally made in accordance with the following 

ciated in its natural state, preferably constituting at least chart: 



Original Residue 


Exemplary Substitutions 


Ala 


Scr 


Alg 


Lys 




nin Hie 


Asp 


Glu 


Cys 


Ser, Ala 


Gin 


Asn 


Glu 


Asp 


Gly 


Pro 


HU 


Asn, Gin 


De 


Leu, Vai 


Leu 


He, Val 


Lys 


Arg, Gin, Glu 


Met 


Leu, lie 


Phc 


Met, Leu, Tyr 


Ser 


Thr 


Thr 


Ser 


Trp 


Tyi 


TS'T 


Trp, Phc 


Va\ 


tic, Uu 
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quency lower than a cutoff is set to zero. This cutoff is 
CHART I preferably 1%, 2%, 5%, 10% or 20%, with 10% being 

particularly preferred. These frequencies are then built into 
the IbA library. That is, as above, these variable positions are 
5 collected and all possible combinations are generated, but 
the amino acid residues that "fill" the library are utilized on 
a frequency basis. Thus, in a non-frequency based library, a 
variable position that has 5 possible residues will have 20% 
of the proteins comprising that variable position with the 
10 first possible residue, 20% with the second, etc. However, in 
a frequency based library, a variable position that has 5 
possible residues with frequencies of 10%, 15%, 25%, 30% 
and 20%, respectively, will have 10% of the proteins com- 
prising that variable position with the first possible residue, 
15 15% of the proteins with the second residue, 25% with the 
third, etc. As will be appreciated by those in the art, the 
actual frequency may depend on the method used to actually 
generate the proteins; for example, exact frequencies may be 
possible when the proteins are synthesized. However, when 
20 the frequency-based primer system outlined below is used, 
the actual frequencies at each position will vary, as outlined 
Substantial changes in function or immunological identity below, 
are made by selecting substitutions that are less conservative As will be appreciated by those in the art and outlined 
than those shown in Chart I. For example, substitutions may herein, probability distribution tables can be generated in a 
be made which more significantly affect: the structure of the 25 variety of ways. In addition to the methods outlined herein, 
polypeptide backbone in the area of the alteration, for self-consistent mean field (SCMF) methods can be used in 
example the alpha-helical or beta-sheet structure; the charge the direct generation of probability tables. SCMF is a 
or hydrophobicity of the molecule at the target site; or the deterministic computational method that uses a mean field 
bulk of the side chain. The substitutions which in general are description of rotamer interactions to calculate energies. A 
expected to produce the greatest changes in the polypep- 30 probability table generated in this way can be used to create 
tide's properties are those in which (a) a hydrophilic residue, libraries as described herein. SCMF can be used in three 
e.g. seryl or threonyl, is substituted for (or by) a hydrophobic ways: the frequencies of amino acids and rotamers for each 
residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl; amino acid are listed at each position; the probabilities are 
(b) a cysteine or proline is substituted for (or by) any other determined directly from SCMF (see Delarue et la. Pac. 
residue; (c) a residue having an electropositive side chain, 35 Symp. Biocomput. 109-21 (1997), expressly incorporated 
e.g. lysyl, arginyl, or histidyl, is substituted for (or by) an by reference). In addition, highly variable positions and 
electronegative residue, e.g. glutamyl or aspartyl; or (d) a non-variable positions can be identified. Alternatively, 
residue having a bulky side chain, e.g. phenylalanine, is another method is used to determine what sequence is 
substituted for (or by) one not having a side chain, e.g. jumped to during a search of sequence space; SCMF is used 
glycine. 40 to obtain an accurate energy for that sequence; this energy 

The variants typically exhibit the same qualitative bio- is then used to rank it and create a rank-ordered list of 
logical activity and will elicit the same immune response as sequences (similar to a Monte Carlo sequence list). A 
the original IbA protein, although variants also are selected probability table showing the frequencies of amino acids at 
to modify the characteristics of the IbA proteins as needed. each position can then be calculated from this list (Koehl ct 
Alternatively, the variant may be designed such that the 45 al., J. Mol. Biol 239:249 (1994); Koehl et al., Nat. Slrxic. 
biological activity of the IbAprotein is altered. For example, Biol. 2:163 (1995); Koehl et al., Curr. Opin. Struct. Biol, 
glycosylation sites may be altered or removed. Similarly, the 6:222 (1996); Koehl et al, J. Mol Bio. 293:1183 (1999); 
biological function may be altered; for example, in some Koehl et al., J. Mol. Biol. 293:1161 (1999); Lee J. Mol Biol 
instances it may be desirable to have more or less potent 236:918 (1994); and Vasquez Biopolymers 36:53-70 
IFN-P activity. 50 (1995); all of which are expressly incorporated by reference. 

The IbA proteins and nucleic acids of the invention can be Similar methods include, but are not limited to, OPLS-AA 
made in a number of ways. Individual nucleic acids and (Jorgensen, et al, J. Am. Chem. Soc. (1996), v 118, pp 
proteins can be made as known in the art and outlined below. 11225-11236; Jorgensen, W. L.; BOSS, Version 4.1; Yale 
Alternatively, libraries of IbA proteins can be made for University: New Haven, Conn. (1999)); OPI^ (Jorgensen, 
testing. 55 et al, J. Am. Chem. Soc. (1988), v 110, pp 1657ff; 

In a preferred embodiment, sets or libraries of IbA pro- Jorgensen, et al, J Am. Chem. Soc. (1990), v 112, pp 
teins are generated from a probability distribution table. As 4768ff); UNRES (United Residue Forcefield; Liwo, et al. 
outlined herein, there are a variety of methods of generating Protein Science (1993), v 2, ppl697-1714; Liwo, et al, 
a probability distribution table, including using PDA, Protein Science (1993), v 2, ppl715-1731; Liwo, et al, J, 
sequence alignments, forcefield calculations such as SCMF 60 Comp. Chem. (1997), v 18, pp849-873; Liwo, et al, J, 
calculations, etc. In addition, the probability distribution can Comp. Chem. (1997), v 18, pp874-884; Liwo, et al, J. 
be used to generate information entropy scores for each Comp. Chem. (1998), v 19, pp25 9-276; Forcefield for 
position, as a measure of the mutational frequency observed Protein Structure Prediction (Liwo, et al, Proc. Natl. Acad, 
in the library. Sci. USA (1999), v 96. pp5482-5485); ECEPP/3 (Liwo et 

In this embodiment, the frequency of each amino acid 65 al, J Protein Chem 1994 May; 13(4): 375-80); AMBER 1.1 
residue at each variable position in the list is identified. force field (Weiner, et al, J. Am. Chem. Soc. vl06, 
Frequencies can be thresholded, wherein any variant fre- pp765-784); AMBER 3.0 force field (U.C. Singh et al. 
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Proc. Natl. Acad. Sci. USA. 82:755-759); CHARMM and performed to create ftill length sequences containing the 

CHARMM22 (Brooks, et al., J. Comp. Chem. v4, pp combinations of mutations defined by the library In 

187-217); cvff3.0 (Dauber-Osguthorpe, et al., (1988) Pro- addition, this may be done using error-prone PGR methods 

teins: Stmcture, Function and Genetics, v4, pp31^7); cff91 In a preferred embodiment, the different oligonucleotides 
(Maple, et al., J. Comp. Chem. v 5, 162-182); also, the 5 are added in relative amounts corresponding to the prob- 

DISCOVER (cvff and cff91) and AMBER forcefields are ability distribution table. The multiple PCR reactions thus 

used m the INSIGHT molecular modeling package (Biosym/ result in fuU length sequences with the desired combinations 

MSI, San Dicgo Calif.) and HARMM is used in the of mutations in the desired proportions. 

QUANTA molecular modeUng package (Biosym/MSI, San The total number of oligonucleotides needed is a function 
Diego Calif.). lO of the number of positions being mutated and the number of 

In addition, as outlined herein, a preferred method of mutations being considered at these positions: (number of 

generating a probability distribution table is through the use oiigos for constant positions)+Ml+M2+M3+ . . . Mn=(total 

ofsequence alignment programs. In addition, the probability number of oiigos required), where Mn is the number of 

table can be obtained by a combination of sequence align- mutations considered at position n in the sequence, 
ments and computational approaches. For example, one can is In a preferred embodiment, each overlapping oligonucle- 

add amino acids found in the alignment of homologous otide comprises only one position to be varied; in alternate 

sequences to the result of the computation. Preferable one embodiments, the variant positions are too close together to 

can add the wild type amino acid identity to the probability aUow this and multiple variants per oligonucleotide are used 

table if it is not found in the computation. to allow complete recombination of all the possibilities. That 
As will be appreciated, an IbA library created by recom- 20 is, each oligo can contain the codon for a single position 

binmg variable positions and/or residues at the variable being mutated, or for more than one position being mutated, 

position may not be in a rank-ordered list. In some The multiple positions being mutated must be close in 

embodiments, the entire list may just be made and tested. sequence to prevent the oligo length from being impractical. 

Alternatively, in a preferred embodiment, the IbA library is For multiple mutating positions on an oligonucleotide, par- 
also in the form of a rank ordered list, lliis may be done for 25 ticular combinations of mutations can be included or 

several reasons, including the size of the library is still too excluded in the library by including or excluding the oligo- 

big to generate experimentally, or for predictive purposes. nucleotide encoding that combination. For example, as 

This may be done in several ways. In one embodiment, the discussed herein, there may be correlations between variable 

library is ranked using the scoring functions of PDA to rank regions; that is, when position X is a certain residue, position 
the library members. Alternatively, statistical methods could 30 Y must (or must not) be a particular residue. These sets of 

be used. For example, the library may be ranked by fre- variable positions are sometimes referred to herein as a 

quency score; that is, proteins containing the most of high "cluster". When the clusters are comprised of residues close 

frequency residues could be ranked higher, etc. This may be together, and thus can reside on one oligonucleotide primer, 

done by adding or multiplying the frequency at each variable the clusters can be set to the "good" correlations, and 
position to generate a numerical score. Similarly, the hbmy 35 ehminate the bad combinations that may decrease the effec- 

different positions could be weighted and then the proteins tiveness of the library. However, if the residues of the cluster 

scored; for example, those containing certain residues could are far apart in sequence, and thus will reside on different 

be arbitrarily ranked. oligonucleotides for synthesis, it may be desirable to either 

In a preferred embodiment, the different protein members set the residues to the "good" correlation, or eliminate them 
of the IbA library may be chemically synthesized. ITiis is 40 as variable residues entirely. In an alternative embodiment, 

particularly useful when the designed proteins are short, the library may be generated in several steps, so that the 

preferably less than 150 amino acids in length, with less than cluster mutations only appear together. This procedure, i.e. 

100 amino acids being preferred, and less than 50 amino the procedure of identifying mutation clusters and either 

acids being particularly preferred, although as is known in placing them on the same oligonucleotides or eliminating 

the art, longer proteins can be made chemically or enzy- 45 them from the library or library generation in several steps 

matically. See for example Wilken el al, Curr. Opin. Bio- preserving clusters, can considerably enrich the experimen- 

technoL 9:412-26 (1998), hereby expressly incorporated by tal library with properly folded protein. Identification of 

reference. clusters can be carried out by a number of ways, e.g. by 

In a preferred embodiment, particularly for longer pro- using known pattern recognition methods, comparisons of 

teins or proteins for which large samples are desired, the 50 frequencies of occurence of mutations or by using energy 

library sequences are used to create nucleic acids such as analysis of the sequences to be experimentally generated 

DNA which encode the member sequences and which can (for example, if the energy of interaction is high, the 

then be cloned into host cells, expressed and assayed, if positions are correlated). Iliese correlations may be posi- 

desired. Thus, nucleic acids, and particularly DNA, can be tional correlations (e.g. variable positions 1 and 2 always 

made which encodes each member protein sequence. This is 55 change together or never change together) or sequence 

done using well known procedures. The choice of codons. correlations (e.g. if there is residue A at position 1, there is 

suitable expression vectors and suitable host cells will vary always residue B at position 2). See: Pattern discovery in 

depending on a number of factors, and can be easily opti- Biomolecular Data: Tools, Techniques, and Applications; 

mized as needed. edited by Jason T. L. Wang, Bruce A. Shapiro, Dennis 

In a preferred embodiment, multiple PCR reactions with 60 Shasha. New York: Oxford University, 1999; Andrews, 

pooled oligonucleotides is done, as is generally depicted in Harry C. Introduction to mathematical techniques in pattern 

FIG. 17. In this embodiment, overiapping oligonucleotides recognition; New York, Wiley-lnterscience [1 972]; Appli- 

are synthesized which correspond to the fuU length gene. cations of Pattern Recognition; Editor, K. S. Fu. Boca Raton. 

Again, these oligonucleotides may represent all of the dif- Fla. CRC Press, 1982; Genetic Algorithms for Pattern Rec- 

ferent amino acids at each variant position or subsets. 65 ognition; edited by Sankar K. Pal, Paul P. Wang. Boca 

In a preferred embodiment, these oligonucleotides are Raton: CRC Press, cl996; Pandya, Abhijit S., Pattern rec- 

pooled in equal proportions and multiple PCR reactions are ognition with neural networks in C++/Abhijit S. Pandya, 
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Robert B. Macy. Boca Ratoa, Fla.: CRC Press, 1996; In a preferred embodiment, gene shuffling with error 

Handbook of pattern recognition & computer vision I edited prone PGR can be performed on the gene for the optimal 

p- J: m\^*^ c^* « ^^"f^n°^ Singapore; sequence, in the presence of bias oligonucleotides, to create 

River Edge, NJ : World Scientific, cl999; Friedman, Intro- a DNA sequence library that reflects the proportion of the 
NeurT 1 ^ n.>T''°"^ Statistical, Structural, 5 mutations found in the IbA library. The choice of the bias 

perception and artificial intelligence; vol. 32; all of which ^^f-^ T ^^l^t frequency, i.e. ohgonucle- 

are expressly incorporated by reference. In addition, pro- ""'"^^"S *l^Sh mutational frequency positions can be 

grams used to search for consensus motifs can be used as used; alternatively, oligonucleotides contaimng the most 

well. variable positions can be used, such that the diversity is 

In addition, correlations and shuffling can be fixed or increased; if the secondary library is ranked, some number 

optimized by altering the design of the oligonucleotides; that scoring positions can be used to generate bias oligo- 

is, by deciding where the oligonucleotides (primers) start nucleotides; random positions may be chosen; a few top 

and stop (e.g. where the sequences are "cut"). The start and scoring and a few low scoring ones may be chosen; etc. 

stop sites of oligos can be set to maximize the number of What is important is to generate new sequences based on 

clusters that appear in single oligonucleotides, thereby preferred variable positions and sequences, 

enriching the library with higher scoring sequences. Differ- Ifl a preferred embodiment, PGR using a wild type gene 

ent oligonucleotide start and stop site options can be com- or other gene can be used, as is schematically depicted in 

putationally modeled and ranked according to number of FIG. 18. In this embodiment, a starting gene is used; 
clusters that are represented on single oligos, or the percent- 20 generally, although this is not required, the gene is usually 

age of the resulting sequences consistent with the predicted the wild type gene. In some cases it may be the gene 

library of sequences. encoding the global optimized sequence, or any other 

The total number of oligonucleotides required increases sequence of the list, or a consensus sequence obtained e.g. 

when multiple mutable positions are encoded by a single from aligning homologous sequences from different organ- 
oligonucleotide. ^ITie annealed regions are the ones that 25 isms. In this embodiment, oligonucleotides are used that 

remain constant, i.e. have the sequence of the reference correspond to the variant positions and contain the different 

sequence. amino acids of the library. PGR is done using PGR primers 

Oligonucleotides with insertions or deletions of codons at the termini, as is known in the art. This provides two 

can be used to create a library expressing different length benefits; the first is that this generally requires fewer oligo- 
proteins. In particular computational sequence screening for 30 nucleotides and can result in fewer errors. In addition, it has 

insertions or deletions can resuh in secondary hbraries experimental advantages in that if the wild type gene is used, 

defining different length proteins, which can be expressed by it need not be synthesized. 

a library of pooled oligonucleotide of different lengths. In addition, there are several other techniques that can be 

In a preferred embodiment, the IbA library is done by used, as exemplified in the figures, e.g. FIGS. 19-21. In a 
shuffling the family (e.g. a set of variants); that is, some set 35 preferred embodiment, ligation of PGR products is done, 

of the top sequences (if a rank-ordered list is used) can be In a preferred embodiment, a variety of additional steps 

shuffled, either with or without error-prone PGR. "Shuffling" may be done to the IbA library; for example, further com- 

in this context means a recombination of related sequences, putational processing can occur, different IbA libraries can 

generally in a random way. It can include "shuffling" as be recombined, or cutofis from different libraries can be 

defined and exemplified in U.S. Pat. Nos. 5,830,721; 5,811, 40 combined. In a preferred embodiment, an IbAlibrary may be 

238; 5,605,793; 5,837,458 and PGT US/19256, all of which computationally remanipulated to form an additional IbA 

are expressly incorporated by reference in their entirety. This library (sometimes referred to herein as "tertiary libraries"), 

set of sequences can also be an artificial set; for example, For example, any of the IbA library sequences may be 

from a probability table (for example generated using chosen for a second round of PDA, by freezing or fixing 

SGMF) or a Monte Gario set. Similarly, the "family" can be 45 some or all of the changed positions in the first library, 

the top 10 and the bottom 10 sequences, the top 100 Alternatively, only changes seen in the last probability 

sequence, etc. This may also be done using error-prone PGR. distribution table are allowed. Alternatively, the stringency 

Thus, in a preferred embodiment, in silico shuffling is of the probability table may be altered, either by increasing 

done using the computational methods described herein. or decreasing the cutoff for inclusion. Similariy, the IbA 

That is, starting with either two libraries or two sequences, 50 library may be recombined experimentally after the first 

random recombinations of the sequences can be generated round; for example, the best gene/genes fnDm the first screen 

and evaluated. may be taken and gene assembly redone (using techniques 

In a preferred embodiment, error-prone PGR is done to outlined below, multiple PGR, error prone PGR, shuffling, 

generate the IbA library. See U.S. Pat. Nos. 5,605,793, etc.). Alternatively, the fragments from one or more good 

5,811,238, and 5,830,721, all of which are hereby incorpo- 55 gene(s) to change probabilities at some positions. This 

rated by reference. This can be done on the optimal sequence biases the search to an area of sequence space found in the 

or on top members of the library, or some other artificial set first round of computational and experimental screening, 

or family. In this embodiment, the gene for the optimal In a preferred embodiment, a tertiary library can be 

sequence found in the computational screen of the primary generated from combining different IbA libraries. For 

library can be synthesized. Error prone PGR is then per- 60 example, a probability distribution table from a first IbA 

formed on the optimal sequence gene in the presence of library can be generated and recombined, either computa- 

oligonucleofides that code for the mutations at the variant tionally or experimentally, as ouUined herein. A PDA IbA 

positions of the library (bias oligonucleotides). The addition library may be combined with a sequence alignment IbA 

of the oligonucleotdes will create a bias favoring the incor- library, and either recombined (again, computationally or 

poration of the mutations in the library. Alternatively, only 65 experimentally) or just the cutoffs from each joined to make 

oligonucleotdes for certain mutations may be used to bias a new tertiary library. The top sequences from several 

the library. libraries can be recombined. Sequences from the top of a 
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library can be combined with sequences from the bottom of Numerous types of appropriate expression vectors, and 
the library to more broadly sample sequence space, or only suitable regulatory sequences are known in the art for a 

sequences distant from the top of the library can be com- variety of host cells. 

bined. lb A libraries that analyzed different parts of a protein In general, the transcriptional and translational regulatory 

can be combined to a tertiary library that treats the combined 5 sequences may include, but are not limited to, promoter 

parts of the protein. sequences, ribosomal binding sites, transcriptional start and 

In a preferred embodiment, a tertiary library can be stop sequences, translational start and stop sequences, and 

generated using correlations in an IbA library. That is, a enhancer or activator sequences. In a preferred embodiment, 

residue at a first variable position may be correlated to a the regulatory sequences include a promoter and transcrip- 

residue at second variable position (or correlated to residues ]0 tional start and stop sequences. 

at additional positions as well). For example, two variable Promoter sequences encode either constitutive or indue- 

positions may sterically or electrostatically interact, such ible promoters. The promoters may be either naturally 

that if the first residue is X, the second residue must be Y occurring promoters or hybrid promoters. Hybrid promoters, 

ITiLs may be either a positive or negative correlation. which combine elements of more than one promoter, are also 

Using the nucleic acids of the present invention which is known in the an, and are useful in the present invention. In 
encode an IbA protein, a variety of expression vectors are a preferred embodiment, the promoters are strong 
made. The expression vectors may be either self-replicating promoters, allowing high expression in cells, particulariy 
extrachromosomal vectors or vectors which integrate into a mammalian cells, such as the CM V promoter, particularly in 
host genome. Generally, these expression vectors include combination with a Tet regulatory element, 
transcriptional and translational regulatory nucleic acid 20 In addition, the expression vector may comprise addi- 
operably linked to the nucleic acid encoding the IbA protein. tional elements. For example, the expression vector may 
The term "control sequences" refers to DNA sequences have two replication systems, thus allowing it to be main- 
necessary for the expression of an operably linked coding tained in two organisms, for example in mammalian or 
sequence in a particular host organism. The control insect cells for expression and in a prokaryotic host for 
sequences that are suitable for prokaryotes, for example, 25 cloning and amplification. Furthermore, for integrating 
include a promoter, optionally an operator sequence, and a expression vectors, the expression vector contains at least 
ribosome binding site. Eukaryotic cells are known to utilize one sequence homologous to the host cell genome, and 
promoters, polyadenylation signals, and enhancers. preferably two homologous sequences which flank the 

Nucleic acid is "operably linked" when it is placed into a expression construct. The integrating vector may be directed 

functional relationship with another nucleic acid sequence. 30 to a specific locus in the host cell by selecting the appro- 

For example, DNA for a presequence or secretory leader is priate homologous sequence for inclusion in the vector, 

operably linked to DNA for a polypeptide if it is expressed Constructs for integrating vectors are well known in the art. 

as a preprotein that participates in the secretion of the In addition, in a preferred embodiment, the expression 

polypeptide; a promoter or enhancer is operably linked to a vector contains a selectable marker gene to allow the selec- 

coding sequence if it affects the transcription of the 35 tion of transformed host cells. Selection genes are well 

sequence; or a ribosome binding site is operably linked to a known in the art and will vary with the host cell used, 

coding sequence if it is positioned so as to facilitate trans- A preferred expression vector system is a retroviral vector 

lation. system such as is generally described in PCTAJS97/01019 

In a preferred embodiment, when the endogenous secre- and PCT/US97/01048, both of which are hereby expressly 

lory sequence leads to a low level of secretion of the 40 incorporated by reference. 

naturally occurring protein or of the IbA protein, a replace- In a preferred embodiment, the expression vector com- 
ment of the naturally occurring secretory leader sequence is prises the components described above and a gene encoding 
desired. In this embodiment, an unrelated secretory leader an IbA protein. In this aspect, only one species of an IbA 
sequence is operably linked to an IbA encoding nucleic acid protein will be expressed in the cell comprising the exprcs- 
leading to increased protein secretion. Thus, any secretory 45 sion vector. In one aspect of this embodiment, it is desired 
leader sequence resulting in enhanced secretion of the IbA to express an optimized A-chain of IFN-P and an optimized 
protein, when compared to the secretion of IFN-P and its B-chain of IFN-P within the same cell and thus, two 
secretory sequence, is desired. Suitable secretory leader expression vectors, one comprising a gene coding for an 
sequences that lead to the secretion of a protein are know in optimized A-chain of IFN-p, the other one comprising a 
tbe art. 50 gene coding for an optimized B-chain of IFN-P are intro- 
In another preferred embodiment, a secretory leader duced into the same host cell. This allows formation of a 
sequence of a naturally occurring protein or a protein is preferred IbA dimer. 

removed by techniques known in the art and subsequent In another aspect of this embodiment, an expression 

expression results in intracellular accumulation of the vector is constructed that comprises two IbA genes encoding 

recombinant protein. 55 two different IbAproteins, In this embodiment, one IbA gene 

Generally, "operably linked" means that the DNA encodes an optimized A chain of IFN-P and the second gene 

sequences being linked are contiguous, and, in the case of a encodes an optimized B-chain of IFN-p. In one aspect of this 

secretory leader, contiguous and in reading phase. However, embodiment, a polycistronic gene can be constructed as is 

enhancers do not have to be contiguous. Linking is accom- known in the art for co-expression in a host cell, 

plished by ligation at convenient restriction sites. If such 60 As will be appreciated by those in the art, all combina- 

sites do not exist, the synthetic oligonucleotide adaptors or tions are possible and accordingly, as used herein, the 

linkers are used in accordance with conventional practice. combination of components, comprised by one or more 

The transcriptional and translational regulatory nucleic acid vectors, which may be retroviral or not, is referred to herein 

will generally be appropriate to the host cell used to express as a "vector composition". 

the fusion protein; for example, transcriptional and transla- 65 The IbA nucleic acids are introduced into the cells either 

tional regulatory nucleic acid sequences from Bacillus are alone or in combination with an expression vector. By 

preferably used to express the fusion protein in Bacillus. "introduced into" or grammatical equivalents herein is 
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meant that the nucleic acids enter the cells in a manner include dextran-mediated transfection, calcium phosphate 

suitable for subsequent expression of the nucleic acid. The precipitation, polybrene mediated transfection, protoplast 

method of introduction is largely dictated by the targeted cell fusion, electroporation, viral infection, encapsulation of the 

type, discussed below. Exemplary methods include polynucleotide(s) in liposomes, and direct microinjection of 

(Ca3P04)2 precipitation, liposome fusion, lipofectin®, 5 the DNA into nuclei. As outlined herein, a particularly 

electroporation, viral infection, etc. The lb A nucleic acids preferred method utilizes retroviral infection, as outlined in 

may stably integrate into the genome of the host cell (for PCT US97/01019, incorporated by reference, 

example, with retroviral introduction, outlined below), or As will be appreciated by those in the art, the type of 

may exist either transiently or stably in the cytoplasm (i.e. mammalian cells used in the present invention can vary 

through the use of traditional plasmids, utilizing standard lo widely. Basically, any mammalian cells may be used, with 

regulatory sequences, selection markers, etc.). mouse, rat, primate and human cells being particularly 

The IbA proteins of the present invention are produced by preferred, although as will be appreciated by those in the art, 

culturing a host cell transformed with an expression vector modifications of the system by pseudotyping allows all 

containing nucleic acid encoding an IbA A protein, under the eukaryotic cells to be used, preferably higher eukaryotes. As 

appropriate conditions to induce or cause expression of the is is more fully described below, a screen will be set up such 

IbA protein. The conditions appropriate for IbA protein that the cells exhibit a selectable phenotype in the presence 

expression wqll vary with the choice of the expression vector of a bioactive peptide. As is more fully described below, cell 

and the host cell, and will be easily ascertained by one types implicated in a wide variety of disease conditions are 

skilled in the art through routine experimentation. For particulariy useful, so long as a suitable screen may be 

example, the use of constitutive promoters in the expression 20 designed to allow the selection of cells that exhibit an altered 

vector will require optimizing the growth and proliferation phenotype as a consequence of the presence of a peptide 

of the host cell, while the use of an inducible promoter within the cell. 

requires the appropriate growth conditions for induction. In Accordingly, suitable cell types include, but are not 

addition, in some embodiments, the timing of the harvest is limited to, mmor cells of all types (particularly melanoma, 

important. For example, the baculoviral systems used in 25 myeloid leukemia, carcinomas of the lung, breast, ovaries, 

insect cell expression are lytic viruses, and thus harvest time colon, kidney, prostate, pancreas and testes), 

selection can be crucial for product yield. cardiomyocytes, endothelial cells, epithelial cells, lympho- 

Appropriate host cells include yeast, bacteria, cytes (T-cell and B cell) , mast cells, eosinophils, vascular 
archebacteria, fungi, and insect and animal cells, including intimal cells, hepatocytes, leukocytes including mono- 
mammalian cells. Of particular interest are Drosophila 30 nuclear leukocytes, stem cells such as haemopoetic, neural, 
melangaster cells, Saccharomyces cerevisiae and other skin, lung, kidney, liver and myocyte stem cells (for use in 
yeasts, £. coli, Bacillus subtiliSy SF9 cells, C129 cells, 293 screening for differentiation and de-differentiation factors), 
cells, Neurospora, BHK, CHO, COS, Pichia Pastoris, etc. osteoclasts, chondrocytes and other connective tissue cells, 

In a preferred embodiment, the IbA proteins are expressed keratinocytes, melanocytes, liver cells, kidney cells, and 

in mammalian cells. Mammalian expression systems are 35 adipocytes. Suitable cells also include known research cells, 

also known in the art, and include retroviral systems. A including, but not limited to, Jurkat T cells, NIH3T3 cells, 

mammalian promoter is any DNA sequence capable of CHO, Cos, etc. See the ATCC cell line catalog, hereby 

binding mammalian RNA polymerase and initiating the expressly incorporated by reference, 

downstream (3') transcription of a coding sequence for the In one embodiment, the cells may be additionally geneti- 

fusion protein into mRNA. A promoter will have a tran- 40 cally engineered, that is, contain exogeneous nucleic acid 

scription initiating region, which is usually placed proximal other than the IbA nucleic acid. 

to the 5' end of the coding sequence, and a TATA box, using In a preferred embodiment, the IbA proteins are expressed 

a located 25-30 base pairs upstream of the transcription in bacterial systems. Bacterial expression systems are well 

initiation site. The TATA box is thought to direct RNA known in the art. 

polymerase II to begin RNA synthesis at the correct site. A 45 A suitable bacterial promoter is any nucleic acid sequence 

mammalian promoter will also contain an upstream pro- capable of binding bacterial RNA polymerase and initiating 

moter element (enhancer element),- typically located within the downstream (3') transcription of the coding sequence of 

100 to 200 base pairs upstream of the TATA box. An the IbA protein into mRNA. A bacterial promoter has a 

upstream promoter element determines the rate at which transcription initiation region which is usually placed proxi- 

transcription is initiated and can act in either orientation. Of 50 mal to the 5' end of the coding sequence. This transcription 

particular use as mammalian promoters are the promoters initiation region typically includes an RNA polymerase 

from mammalian viral genes, since the viral genes are often binding site and a transcription initiation site. Sequences 

highly expressed and have a broad host range. Examples encoding metabolic pathway enzymes provide particularly 

include the SV40 eariy promoter, mouse mammary tumor useful promoter sequences. Examples include promoter 

virus LTR promoter, adenovirus major late promoter, herpes 55 sequences derived from sugar metabolizing enzymes, such 

simplex virus promoter, and the CMV promoter. as galactose, lactose and maltose, and sequences derived 

Typically, transcription tennination and polyadenylation from biosynthetic enzymes such as tryptophan. Promoters 

sequences recognized by mammalian cells are regulatory from bacteriophage may also be used and are known in the 

regions located 3' to the translation stop codon and thus, art. In addition, synthetic promoters and hybrid promoters 

together with the promoter elements, flank the coding 60 are also useful; for example, the tac promoter is a hybrid of 

sequence. The 3' terminus of the mature mRNA is formed by the irp and lac promoter sequences. Furthermore, a bacterial 

site-specilic post-translational cleavage and polyadenyla- promoter can include naturally occurring promoters of non- 

tion. Examples of transcription terminator and polyadenly- bacterial origin that have the abihty to bind bacterial RNA 

tion signals include those derived form SV40. polymerase and initiate transcription. 

The methods of introducing exogenous nucleic acid into 65 In addition to a functioning promoter sequence, an efl5- 

mammalian hosts, as well as other hosts, is well known in cient ribosome binding site is desirable. In £. coli, the 

the art, and will vary with the host cell used. Techniques ribosome binding site is called the Shine-Delgamo (SD) 
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sequence and includes an initiation codon and a sequence amino acid residues of an IbA polypeptide with an organic 

3-9 nucleotides in length located 3-11 nucleotides upstream derivatizing agent that is capable of reacting with selected 

of the initiation codon. side chains or the N-or C-terminal residues of an IbA 

The expression vector may also include a signal peptide polypeptide. Derivatization with bifunctional agents is 
sequence that provides for secretion of the IbA protein in 5 useful, for instance, for crosslinking an IbA protein to a 

bacteria. The signal sequence typically encodes a signal water- insoluble support matrix or surface for use in the 

peptide comprised of hydrophobic amino acids which direct method for purifying anli-lbA antibodies or screening 

the secretion of the protein from the cell, as is well known assays, as is more fully described below. Commonly used 

in the art. The protein is either secreted into the growth crosslinking agents include, e.g., l,l-bis(diazoacetyl)-2- 
media (gram-positive bacteria) or into the periplasmic space lo pt^enylethane, glutaraldehyde, N-hydroxysuccinimide 

located between the inner and outer membrane of the ceU example, esters with 4-azidosalicylic acid, homo- 

(gram-negative bacteria). For expression in bacteria, usually t^if^/ictiona imidoesters, mcludmg disuccinimidyl esters 

bacterial secretory leader sequences, operably linked to an -dithiobis(succmimidylpropionate), bifunctional 

IbA encoding nucleic acid, are preferred. malemiides such as bis-N-male,mido-l^^^^ and agents 

able marker gene to allow for the selecUon of bacterial ^nd asparaginyl residues to the corresponding glutamyl and 

strains that have been transformed. Suitable selection genes aspartyl residues, respectively, hydroxylation of proline and 

include genes which render the bacteria resistant to drugs lysine, phosphorylation of hydroxyl groups of seryl or 

such as ampicilhn, chloramphenicol, erythromycin, threonyl residues, methylation of the "-amino groups of 

kanamycin, neomycin and tetracycline. Selectable markers 20 lysine, arginine, and histidine side chains [T. E. Creighton, 

also include biosynthetic genes, such as those in the Proteins: Structure and Molecular Properties, W. H. Free- 

histidine, tryptophan and leucine biosynthetic pathways. man & Co.. San Francisco, pp. 79-86 (1983)], acetylaffon of 

These components are assembled into expression vectors. the N-terminal amine, and amidation of any C-terminal 

Expression vectors for bacteria are well known in the art, carboxyl group. 

and include vectors for Bacillus subtilis, E, coli, Strepto- is Another type of covalent modification of the IbApolypep- 

coccus cremoris, and Streptococcus lividans, among others. fide included within the scope of this invention comprises 

The bacterial expression vectors are transformed into altering the native glycosylation pattern of the polypeptide, 

bacterial host cells using techniques well known in the art, "Altering the native glycosylation pattern" is intended for 

such as calcium chloride treatment, electroporation, and purposes herein to mean deleting one or more cart)ohydrate 

30 moieties found in native sequence IbA polypeptide, and/or 

In one embodiment, IbA proteins are produced in insect adding one or more glycosylation sites that are not present 

cells. Expression vectors for the transformation of insect in the native sequence IbA polypeptide, 

cells, and in particular, baculo virus-based expression Addition of glycosylation sites to IbA polypeptides may 

vectors, are well known in the art. be accompUshed by altering the amino acid sequence 

In a preferred embodiment, IbA protein is produced in 35 thereof. The alteration may be made, for example, by the 
yeast cells. Yeast expression systems arc well known in the addition of, or substitution by, one or more serine or threo- 
art, and include expression vectors for Saccharomyces nine residues to the native sequence IbA polypeptide (for 
cerevisiae, Candida albicans and C. maltosa, Hansenula 0-linked glycosylation sites). The IbA amino acid sequence 
polymorpha, Kluyveromyces fragilis and K, lactis, Pichia may optionally be altered through changes at the DNA level, 
guillerimondii^ndP.pastoris.Schizosaccharomycespombe, 40 particularly by mutating the DNA encoding the IbA 
and Yarrowia lipolytica. Preferred promoter sequences for polypeptide at preselected bases such that codons are gen- 
expression in yeast include the inducible GAL1,10 erated that will traaslate into the desired amino acids, 
promoter, the promoters from alcohol dehydrogenase, Another means of increasing the number of carbohydrate 
enolase, glucokinase, glucose-6-phosphatc isomcrase, moieties on the IbA polypeptide is by chemical or enzymatic 
glyceraldehyde-3-phosphate-dehydrogenase, hcxokinase, 45 coupling of glycosides to the polypeptide. Such methods are 
phosphofructokinase, 3-phosphoglycerate mutase, pyruvate described in the art, e.g., in WO87/05330 published Sep. 11, 
kinase, and the acid phosphatase gene. Yeast selectable 1987, and in Aplin and Wriston, CRC Cril. Rev. Biochem.] 
markers include ADE2, HIS4, LEU2, TRPl, and ALG7, pp. 259-306 (1981). 

which confers resistance to tunicamycin; the neomycin Removal of carbohydrate moieties present on the IbA 

phosphotransferase gene, which confers resistance to G418; 50 polypeptide may be accomplished chemically or enzymati- 

and the CUPl gene, which allows yeast to grow in the cally or by mutational substitution of codons encoding for 

presence of copper ions. amino acid residues that serve as targets for glycosylabon. 

In addition, the IbA polypeptides of the invention may be Chemical deglycosylation techniques are known in the art 

further fused to other proteins, if desired, for example to and described, for instance, by Hakimuddin, et al., Arch, 

increase expression or stabilize the protein. 55 Biochem. Biophys., 259:52 (1987) and by Edge et al., Anal. 

In one embodiment, the IbA nucleic acids, proteins and Biochem., 118:131 (1981). Enzymatic cleavage of carbohy- 

antibodies of the invention are labeled with a label other than drate moieties on polypeptides can be achieved by the use of 

the scaffold. By "labeled" herein is meant that a compound a variety of cndo-and exo-glycosidases as described by 

has al least one element, isotope or chemical compound Thotakura et al., Melh. Enzymol., 138:350 (1987). 

attached to enable the detection of the compound. In general, 60 Such derivatized moieties may improve the solubility, 

labels fall into three classes: a) isotopic labels, which may be absorption, permeability across the blood brain barrier, 

radioactive or heavy isotopes; b) immune labels, which may biological half life, and the like. Such moieties or modifi- 

be antibodies or antigens; and c) colored or fluorescent dyes. cations of IbA polypeptides may alternatively eliminate or 

The labels may be incorporated into the compound at any attenuate any possible undesirable side effect of the protein 

position. 65 and the like. Moieties capable of mediating such effects are 

Oncemade, the IbA proteins may be covalently modified. disclosed, for example, in Remington's Pharmaceutical 

One type of covalenl modification includes reacting targeted Sciences, 16th ed., Mack Publishing Co., Easton, Pa. (1980). 
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Another type of covalent modification of IbA comprises fibrosis; inflammatory diseases; viral diseases; infections 

linking the IbA polypeptide to one of a variety of nonpro- caused by papilloma viruses, such as genital warts and 

teinaceous polymers, e.g., polyethylene glycol, polypropy- condylomata of the uterine cervix; infections caused by 

lene glycol, or polyoxyalkylenes, in the manner set forth in hepatitis viruses, such as acute/chronic hepatitis B and 

U.S. Pat. Nos. 4,640,835; 4,496,689; 4,301,144; 4,670,417; 5 non-A, non-B hepatitis (hepatitis C); infections caused by 

4,791,192 or 4,179,337. herpes viruses, such as herpes genitalis, herpes zoster, 

IbA polypeptides of the present invention may also be herpes keratitis, and herpes simplex; viral encephalitis; 

modified in a way to form chimeric molecules comprising an cytomegalovirus pneumonia; prophylaxis of rhinovirus; 

IbA polypeptide fused to another, heterologous polypeptide cancer, including several malignant diseases such as 

or amino acid sequence. In one embodiment, such a chi- lo osteosarcoma, basal cell carcinoma, cervical dysplasia, 

meric molecule comprises a fusion of an IbA polypeptide glioma, acute myeloid leukemia, multiple myeloma,' 

with a tag polypeptide which provides an epitope to which Hodgkin's disease, melanoma, renal cancer, liver cancer, 

an anti-tag antibody can selectively bind, l^e epitope tag is and breast cancer 

generally placed at the amino-or carboxyl-terminus of the In a preferred embodiment, a therapeutically effective 

IbApolypeptide. The presence ofsuchepitope-tagged forms is dose of an IbA protein is administered to a patient in need 

of an IbA polypeptide can be detected using an antibody of treatment. By "therapeutically effective dose" herein is 

against the tag polypeptide. Also, provision of the epitope meant a dose that produces the effects for which it is 

tag enables the IbA polypepbdc to be readily purified by administered. The exact dose will depend on the purpose of 

afSnity purification using an anti-tag antibody or another the treatment, and will be ascertainable by one skilled in the 

type of affinity matrix that binds to the epitope tag. In an 20 art using known techniques. In a preferred embodiment, 

alternative embodiment, the chimeric molecule may com- dosages of about 5 //g/kg are used, administered either 

prise a fusion of an IbA polypeptide with an immunoglo- intraveneously or subcutaneously. As is known in the art, 

bulin or a particular region of an immunoglobulin. For a adjustments for IbA protein degradation, systemic versus 

bivalent fonm of the chimeric molecule, such a fusion could localized delivery, and rate of new protease synthesis, as 

be to the Fc region of an IgG molecule. 25 well as the age, body weight, general health, sex, diet, time 

Various tag polypeptides and their respective antibodies of administration, drug interaction and the severity of the 

are well known in the art. Examples include poly-histidine condition may be necessary, and will be ascertainable with 

(poly-his) or poly-histidine-glycine (poly-his-gly) tags; the routine experimentation by those skilled in the art. 

flu HAtag polypeptide and its antibody 12CA5 [Field etal., A "patient" for the purposes of the present invention 

Mol. CeU. Biol. 8:2159-2165 (1988)]; the c-myc tag and the 30 includes both humans and other animals, particularly 

8F9, 3C7, 6E10, G4, B7 and 9E10 antibodies thereto [Evan mammals, and organisms. Thus the methods are applicable 

et ah. Molecular and Cellular Biology, 5:3610-3616 to both human therapy and veterinary applications. In the 

(1985)]; and the Herpes Simplex virus glycoprotein D (gD) preferred embodiment the patient is a mammal, and in the 

tag and its antibody [Paborsky et al.. Protein Engineering, most preferred embodiment the patient is human. 

3(6):547-553 (1990)]. Other tag polypeptdes include the 35 The term "treatment'' in the instant invention is meant to 

Flag-peptide [Hopp et al., BioTechnology 6:1204-1210 include therapeutic treatment, as well as prophylactic, or 

(1988)]; the KT3 epitope peptde [Martin et al.. Science suppressive measures for the disease or disorder. Thus, for 

255; 192-1944 (1992)]; tubulin epitope peptide [Skinner et example, in the case of multiple sclerosis, successful admin- 

al., J. Biol. Chem. 266:15163-15166 (1991)]; and the 17 istralionof an IbA protein prior lo onset of the disease results 

gene 10 protein peptde tag [Lutz-Freyermuth et al., Proc. 40 in "treatment" of the disease. As another example, success- 

Natl. Acad. vSci. U.S.A. 87:6393-6397 (1990)]. ftil administration of an IbA protein after clinical manifes- 

In a preferred embodiment, the IbA protein is purified or tation of the disease to combat the symptoms of the disease 
isolated after expression. IbA proteins may be isolated or comprises treatment" of the disease. "Treatment" also 
purified in a variety ofways known to those skilled in the art encompasses administration of an IbA protein after the 
depending on what other components arc present in the 45 appearance of the disease in order to eradicate the disease, 
sample. Standard purification methods include Successful administration of an agent after onset and after 
electrophoretic, molecular, immunological and chromato- clinical symptoms have developed, with possible abatement 
graphic techniques, including ion exchange, hydrophobic, of clinical symptoms and perhaps amelioration of the 
affinity, and reverse-phase HPLC chromatography, and chro- disease, comprises "treatment" of the disease, 
matofocusing. For example, the IbA protein may be purified 50 Those "in need of treatment" include mammals, in par- 
using a standard anti-library antibody column. Ultrafiltration ticular humans, already having the disease or disorder, as 
and diafiltration techniques, in conjunction with protein well as those prone to having the disease or disorder, 
concentration, are also useful. For general guidance in including those in which the disease or disorder is to be 
suitable purification techniques, see Scopes, R., Protein prevented. 

Purification, Springer- Verlag, NY (1982). The degree of 55 In another embodiment, a therapeutically effective dose of 

purification necessary will vary depending on the use of the an IbA protein, an IbA gene, or an IbA antibody is admin- 

IbA protein. In some instances no purification will be istered to a patient having a disease involving inappropriate 

necessary. expression of IFN-p. A "disease involving inappropriate 

Once made, the IbA proteins and nucleic acids of the expression of a IFN-P" within the scope of the present 

invention find use in a number of applications. In a preferred 60 invention is meant to include diseases or disorders charac- 

embodiment, the IbA proteins are administered to a patent to terized by an overabundance of IFN-p. This overabundance 

treat an IFN-P-associated disorder may be due to any cause, including, but not limited to, 

By "IFN-p associated disorder" or "IFN-p responsive overexpression at the molecular level, prolonged or accu- 

disorder" or "condition" herein is meant a disorder that can mulated appearance at the site of action, or increased activity 
be ameliorated by the administration of a pharamaceutical 65 of IFN-P relative to normal. Included within this definition 

composition comprising an IFN-p or IbA protein, including, are diseases or disorders characterized by a reduction of 

but not limited to, multiple sclerosis; idiopathic pulmonary IFN-p. This reduction may be due lo any cause, including. 
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but not limited to, reduced expression at the molecular level. Combinations of pharmaceutical compositions may be 

shortened or reduced appearance at the site of action, or administered. Moreover, the compositions may be adminis- 

decreased activity of IFN-p relative to normal. Such an tered in combination with other therapeutics 

overabundance or reduction of IFN-p can be measured In one embodimem provided herein, antibodies, including 
rdative to normal expression, appearance, or activity of 5 but not limited to monoclonal and polyclonal antibodies, are 

Sf'SLreiS -^'^^ '"^^ P-*-- -^h-ds known in the art. 

Tlie administration' of the IbA proteins of the present ' r^'"""^ embodiment, these ant-IbA antibodies are 

invention, preferably in the form of a sterile aqueous for immunotherapy, llius, methods of immuno^ 

solution, can be done in a variety of ways, including, but not ^IIFT'^^ .1 ^'""'^"otherap/ is meant treatment of an 

limited to, orally, subcutaneously, intravenously, ^^^'^ disorders with an antibody raised against an 

intranasally, transdermally, intraperitoneally' IbA protein. As used herein, immunotherapy can be passive 

intramuscularly, intrapulmonary, vaginally, rectally, or ^cii'^G. Passive immunotherapy, as defined herein, is the 

intraocularly. In some instances, for example, in the treat- passive transfer of antibody to a recipient (patient). Active 

mentofwounds, inflammation, or multiple sclerosis, the IbA immunization is the induction of antibody and/or T-cell 

A protein may be directly applied as a solution or spray. responses in a recipient (patient). Induction of an immune 

Depending upon the manner of introduction, the pharma- response can be the consequence of providing the recipient 

ceutical composition may be formulated in a variety of with an IbA protein antigen to which antibodies are raised, 

ways. The concentration of the therapeutically active IbA As appreciated by one of ordinary skill in the art, the IbA 

protein in the formulation may vary from about 0.1 to 100 protein antigen may be provided by injecting an IbA 
weight %. In another preferred embodiment, the concentra- 20 polypeptide against which antibodies are desired to be raised 

tionofthelbAprotein is in the range ofO.003 to 1.0 molar, into a recipient, or contacting the recipient with an IbA 

with dosages from 0.03, 0.05, 0.1, 0.2, and 0.3 millimoles protein encoding nucleic acid, capable of expressing the IbA 

per kilogram of body weight being preferred. protein antigen, under conditions for expression of the IbA 

The pharmaceutical compositions of the present invention protein antigen, 
comprise an IbAprotein in a form suitable for administration 25 In another preferred embodiment, a therapeutic corn- 
to a patient. In the preferred embodiment, the pharmaceu- pound is conjugated to an antibody, preferably an ant-IbA 
tical compositions are in a water soluble form, such as being protein antibody. The therapeutic compound may be a 
present as pharmaceutically acceptable salts, which is meant cytotoxic agent. In this method, targeting the cytotoxic agent 
to include both acid and base addition salts. "Phanmaceuti- to tumor tissue or cells, results in a reduction in the number 
cally acceptable acid addition salt" refers to those salts that 30 of afQicted cells, thereby reducing symptoms associated 
retain the biological effectiveness of the free bases and that with cancer, and IbA protein related disorders. Cytotoxic 
are not biologically or otherwise undesirable, formed with agents are numerous and varied and include, but are not 
inorganic acids such as hydrochloric acid, hydrobromic acid, limited to, cytotoxic drugs or toxins or active fragments of 
sulfuric acid, nitric acid, phosphoric acid and the like, and such toxins. Suitable toxins and their corresponding frag- 
organic acids such as acetic acid, propionic acid, glycolic 35 ments include diptheria A chain, exotoxin A chain, ricin A 
acid, pyruvic acid, oxalic acid, maleic acid, malonic acid, chain, abrin Achain, curcin, crotin, phenomycin, enomycin 
succinic acid, fumaric acid, tartaric acid, citric acid, benzoic and the like. Cytotoxic agents also include radiochemicals 
acid, cinnamic acid, mandelic acid, methanesulfonic acid, made by conjugating radioisotopes to antibodies raised 
ethanesulfonic acid, p-toluenesulfonic acid, salicylic acid against cell cycle proteins, or binding of a radionuclide to a 
and the like. "Pharmaceutically acceptable base addition 40 chelating agent that has been covalently attached to the 
salts*' include those derived from inorganic bases such as antibody. 

sodium, potassium, lithium, ammonium, calcium, In a preferred embodiment, IbA proteins are administered 

magnesium, iron, zinc, copper, manganese, aluminum salts as therapeutic agents, and can be formulated as outlined 

and the like. Particulariy preferred are the ammonium, above. Similarly, IbA genes (including both the full-length 

potassium, sodium, calcium, and magnesium salts. Salts 45 sequence, partial sequences, or regulatory sequences of the 

derived from pharmaceutically acceptable organic non-toxic IbA coding regions) can be administered in gene therapy 

bases include salts of primary, secondary, and tertiary applications, as is known in the art. These IbA genes can 

amines, substituted amines including naturally occurring include antisense applications, either as gene therapy (i.e. for 

substituted amines, cyclic amines and basic ion exchange incorporation into the genome) or as antisense compositions, 

resins, such as isopropylamine, trimethylamine, 50 as will be appreciated by those in the art. 

diethyiamine, triethylamine, tripropylamine, and ethanola- In a preferred embodiment, the nucleic acid encoding the 

™^ne. IbA proteins may also be used in gene therapy. In gene 

'Ilie pharmaceutical compositions may also include one or therapy applications, genes are introduced into cells in order 

more of the following: carrier proteins such as serum to achieve in vivo synthesis of a therapeutically effective 

albumin; buffers such as NaOAc; fillers such as microcrys- 55 genetic product, for example for replacement of a defective 

talline cellulose, lactose, corn and other starches; binding gene. "Gene therapy'* includes both conventional gene 

agents; sweeteners and other flavoring agents; coloring therapy where a lasting effect is achieved by a single 

agents; and polyethylene glycol. Additives are well known treatment, and the administration of gene therapeutic agents, 

in the art, and are used in a variety of formulations, which involves the one time or repeated administration of a 

In addition, in one embodiment, the IbA proteins of the 60 therapeutically effective DNA or mRNA. Antisense RNAs 

present invention are fonmulated using a process for phar- and DNAs can be used as therapeutic agents for blocking the 

maceutical compositions of recombinant IFN-P as described expression of certain genes in vivo. It has already been 

in U.S. Pat. No. 5,183,746 which, hereby, is expressly shown that short anbsense oligonucleotides can be imported 

incorporated in its entirety. into cells where they act as inhibitors, despite their low 

In a further embodiment, the IbA proteins are added in a 65 intracellular concentrations caused by their restricted uptake 

micellular formulation; see U.S. Pat. No. 5,833,948, hereby by the cell membrane. [Zamecnik et al.. Proc. Natl. Acad, 

expressly incorporated by reference in its entirety. Sci. U.S.A. 83:4143^146 (1986)]. The oligonucleotides 
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can be modified to enhance their uptake, e.g. by substituting optimizing residues in the buried core of the protein using 

their negatively charged phosphodiester groups by Protein Design Automation (PDA) as described in W098/ 

uncharged groups . • ^ • 47089, U.S. Ser. Nos. 09/058,459, 09/127,926, 60/104,612, 

nude^adL^Z^^^^^^ ^^/^^^'^O?: 0^/41^^351 6CV181,630, 60/186,904, and U.S 

ing upon whether the nucleic acid is transferred into cultured ^ P**^°* application, entitled Protein Design Automation For 

cells in vitro, or in vivo in the cells of the intended host. P^^f^n Libraries (Filed; Apr. 14, 2000; Inventor: Bassil 

Techniques suitable for the transfer of nucleic acid into Dahiyat), all of which are expressly incorporated by refer- 

mammalian cells in vitro include the use of liposomes, ^nce in their entirety. Several core designs were completed, 

elect ropor at ion, microinjection, cell fusion, DEAE-dextran, with 20-61 residues considered corresponding to 20^-20^^ 

the calcium phosphate precipitation method, etc. The cur- lO sequence possibilities. Residues unexposed to solvent were 

rently preferred in vivo gene transfer techniques include designed in order to minimize changes to the molecular 

transfection with viral (typically retroviral) vectors and viral surface and to limit the potential for antigenicity of designed 

coat protein-hposome mediated transfection [Dzau et al., ^ovel nrotein analogues 

Trends in Biotechnology 11:205-210 (1993)]. In some situ- ^ analogues. 

ations it is desirable to provide the nucleic acid source with Calculations required from 12-19 hours on 16 Silicon 

an agent that targets the target cells, such as an antibody Graphics R 10000 CPU's. The global optimum sequence 

specific for a cell surface membrane protein or the target from each design was selected for characterization. From 

cell, a ligand for a receptor on the target cell, etc. Where 2-11 residues were changed from human IFN-p in the 

liposomes are employed, proteins which bind to a cell designed proteins, out of 166 residues total, 
surface membrane protein associated with endocytosis may 

be used for targeting and/or to facilitate uptake, e.g. capsid 20 COMPUTATIONAL PROTOCOLS 

proteins or fragments thereof tropic for a particular cell type. Template Structure Preparation: 

antibodies for proteins which undergo internalization in ,k;c en.^w ^u^ # c u htxt o 

cychng, proteins that target intracellular locahzation and . J., /.r- phpT u t T^^r}^'^ 

enhance intracellular half-life. The technique of receptor- "^^f^t! \t ^^^^ ^^""^ "^"^ ^^^^ '^'''''^ 

mediated endocytosis is described, for example, by Wu et ^^^^^ Karpusasetal. Proc. Natl. Acad. Sci. U.S.A. 94(22) 

al, J. Biol. Chem. 262:4429-4432 (1987); and Wagner et al., 25 :11813^ (1997)]. Karpasus et al. expressed human IFN-P in 

Proc. Natl. Acad. Sci. U.S.A. 87:3410-3414 (1990). For CHO cells (glycosylated form) and solved the structure by 

review of gene marking and gene therapy protocols see x-ray crystallography to a resolution of 2.2 Angstrom. The 

Anderson et al., Science 256:808-813 (1992). structure of IFN-P is dimeric containing a zinc ion at the 

In a preferred embodiment, IbAgenes are administered as interface and both IFN-P monomers (A^hain and B-chain) 

DNA vaccines either single genes or combinations of IbA 30 are glycosylated at asparagine 80. Although both monomers 

roi^er%in= n^^n;f%r^"°H 1' ^^^^'^^^^ ^^^^'^"^^^-^ 

Methods for the use of genes as DNA vaccines are well '^^^f, ^ami Jnl ^^^^ 

known to one of ordinary skill in the art, and include placing ^^H!" calculations were performed for the 

an IbA gene or portion of an IbA gene under the control of A-chain and B-chain separately. The zinc ion, all water 

a promoter for expression in a patent in need of treatment. molecules and the carbohydrate moiety as well as all hydro- 

The IbA gene used for DNAvaccincs can encode full-length ^"^^^ present in the PDB file lAUl were 

IbA proteins, but more preferably encodes portions of the removed from the structure prior to the PDA calculation. 

IbA proteins including peptides derived from the IbA pro- Design Strategies: 

tein. In a preferred embodiment a patient is immunized with Core residues were selected for design since optimization 

a DNA vaccine comprising a plurality of nucleotide 40 of these positions can improve stability, although stabihza- 

sequences derived from an IbA gene. Similarly, it is possible tion has been obtained from modifications at other sites as 

to immunize a patient with a plurahty of IbA genes or well. Core designs also minimize changes to the molecular 

portions thereof as defined herein. Without being bound by surface and thus limit the designed protein's potential for 

theory, expression of the polypeptide encoded by the DNA antigenicity. PDA calculations were run on 3 core sequences 

vaccine, cytotoxic T-cells, helper T-cells and antibodies are 45 (see FIG. 3) and in a total of 15 core designs (IFN-P A-chain* 

mduced which recognize and destroy or eliminate ceUs Core 1, Core 2, Core 2a, Core 3, Core 4, Core 5, and Core 

expressing IFN-P proteins. 6; IFN-p B-chain: Core 1, Core 2, Core 2a, Core 3, Core 4, 

In a preferred embodiment, the DNA vaccines include a Core 5, Core 6 and Core 7; see below), 

gene encoding an adjuvant molecule with the DNA vaccine. p^^ Calculations 

Such adjuvant molecules include cytokines that increase the ah nnA 1 1 4- r j -.l t ■ 

immunogenic response to the IbA polypeptide encoded by P«f ™«d with salvation 

the DNA vaccine. Additional or alternative adjuvants are ^°^} ^'.^"'h I'?" """rt m ' n'^*-'°*?','c° Uco' ±0^'.""^, 

known to those of ordinary skiU in the art and find use in the Street and Mayo [Fold. Design 3:253-258 (1998)]. If 

invention. possible, Dead End Elimination (DEE) was run to comple- 

The following examples serve to more fully describe the '^"^ PDA ground state. IliLs was done for the PDA 

manner of using the above-described invention, as well as to calculations for the A-chain and B-chain of Core 1, Core 2 

set forth the best modes contemplated for carrying out ^^^^ ^ defined below. For the calculation of Core 

various aspects of the invention. It is understood that these ^» *^re 4, Core 5, Core 6 and Core 7, DEE was aborted after 

examples in no way serve to limit the true scope of this the rotamer sequence space was reduced to less than 10^^ 

invention, but rather are presented for illustrative purposes. sequences. The DEE calculation was for all the given Core 
All references cited herein are incorporated by reference in 60 calculation followed by Monte Carlo (MQ minimization 

their entirety. and a list of the 1000 lowest energy sequences was gener- 

EXAJVIPLE 1 ated. 

A similar procedure was used for the B-chain, where in a 

DESIGN AND CHARACTERIZATION OF first step the side chain of Lys 33 was minimized for 50 steps 
NOVEL IbA PROTEINS BY PDA ^5 followed by an additional 50 steps of minimization of the 

Summary: Sequences for novel interferon-beta activity complete B-chain structure. As the coordinates of residues 

proteins (IbA proteins) were designed by simultaneously 28 to 30 are missing in the B-chain, the N-terminus of Cys 
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31 and the C-terminus of Arg 27 were saturated with a 
hydrogen atom and the NH^-group in Cys 31 and the COOH 
group in Arg 27 were kept fixed during minimization to 
prevent them from moving too far away from their initial 
positions. 5 

Before the PDA calculations were started an initial prepa- 
ration of the structure was performed. For the A-chain, the 
side chains of Phe 50, Glu 61, Lys 115, Met 117 were 
minimized with Biograf for 50 steps using conjugate gra- 
dient procedure without a Coulomb potential, this is fol- lo 
lowed by an additional 50 steps of conjugate gradient 
minimization without a Coulomb potential for the complete 
structure of the A-chain using Biograf. This minimization 
procedure was chosen to remove initial bad contacts in the 
structure. 15 

The PDA calculations for all the designs were run using 
the a2hl pO rotamer library. This library is based on the 
backbone-dependent rotamer library of Dunbrack and Kar- 
plus (Dunbrack and Karplus, J. Mol. Biol. 230(2):543-74 
(1993); hereby expressly incorporated by reference) but 20 
includes more rotamers for the aromatic and hydrophobic 
amino acids; Xj and X2 angle values of rotamers for all the 
aromatic amino acids and angle values for all the other 
hydrophobic amino acids were expanded ±1 standard devia- 
tion about the mean value reported in the Dunbrack and 25 
Karplus library. Typical PDA parameters were used: the van 
der Waals scale factor was set to 0.9, the H-bond potential 
well-depth was set to 8.0 kcal/mol, the solvation potential 
was calculated using type 2 solvation with a nonpolar burial 
energy of 0.048 kcal/mol and a nonpolar exposure muUipH- 30 
cation factor of 1.6, and the secondary structure scale factor 
was set to 0.0 (secondary structure propensities were not 
considered). Calculations required from 12-24 hours on 16 
Silicon Graphics RIOOOO CPU's. 

Monte Carlo Analysis 35 

Monte Carlo analysis of the sequences produced by PDA 
shows the ground state (optimal) amino acid and amino 
acids allowed for each variable position and their frequen- 
cies of occurrence (see FIGS. 4 through 29). 

EXAMPLE 2 

PDA Calculations for the A-chain of IFN-P 

Different PDA calculations were performed for the core 
region of the A-chain of IFN-p, In these calculations the 45 
number of positions included in the PDA design were varied 
and the effect of different PDA parameters on the resuUing 
protein sequences, especially the ground state sequence 
(SEQ ID N0:4), was analyzed. 

A-chain Core 1 Design 

By visual inspection, the following residues were identi- 
fied as belonging to the core of the protein: Leu 6, Gin 10, 
Asn 14, Cys 17, Uu 21, Ala 55, Ala 56, ^Fhr 58, He 59, Met 
62, Leu 63, He 66, He 69, Phe 70, Val 84, Leu 87, Val 91, 55 
Gin 94, Leu 98, Ser 118, Leu 122, TVr 125, Tyr 126, He 129, 
Uu 133, Ala 142, Trp 143, Val 146, He 150, Asn 153, Phe 
154, He 157, and Leu 160. In the first calculation, Cys 17 
was not included. Also excluded from the PDA design were 
Phe 70, Trp 143, and Phe 154, as they are known to be 60 
important in the stabilization of the core region, and Gin 10, 
Thr 58, Gin 94, Ser 118 were excluded as they form side 
chain H -bonds. Furthermore, residues Tyr 125, Tyr 126 and 
Asn 153 were not considered as these amino acids are highly 
conserved in IFN-ps from different organisms as well as Ala 65 
142 as its mutation to ITir is known to lead to loss of 
function. 



Thus, the following positions were included in the PDA 
design (see also FIG. 3): 



6 21 55 56 59 62 63 66 69 84 87 91 
Leu Leu Ala Ala He Met Leu He lie Val Leu Vol 

98 122 129 133 146 150 157 160 
Leu Leu He Leu Val He He Leu 

Met 62 was allowed to change to any PHOBIC amino acid 
(Ala, Val, Leu, He, Phe, Tyr, Trp, Met) and the other residues 
were allowed to change to Ala, Val, Leu, He, Phe, Tyr, Trp 
and the PDA core solvation potential was used including 
surface area calculation. 

The PDA calculation resulted in the following ground 
state sequence (SEQ ID N0:4): 



6 21 55 56 59 62 63 66 69 84 87 91 
Leu Leu Ala Ala He Met Leu He He He Phe Val 

98 122 129 133 146 150 157 160 
Leu Leu He Leu Val He He Leu 

This sequence shows two mutations from the wild type 
IFN-p sequence, V84I and L87F (see FIG. 4B) (SEQ ID 
N0:4). 

Using Monte Carlo technique a list of low energy 
sequences was generated. ITie analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG, 4A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 4A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG. 4B 
(SEQ ID N0:4). 

A-chain Core 2 Design 

To allow more flexibility, all residues that have heavy side 
chain atoms within a distance of 4 Angstrom of any heavy 
side chain atom of the amino acids used in the Core 1 
calculation were added to the PDA calculation. Thus, Met 1, 
Gin 10, Asn 14, Cys 17, Phe 38, Phe 50, Thr 58, Glu 61, Phe 
70, Glu 81, Gin 94, He 95, Leu 102, Lys 115, Tyt 125, Tyr 
126, Leu 130, lyr 138, Thr 144, Arg 147. Leu 151, Asn 153, 
Phe 154, Arg 159, Thr 161, Tyr 163, and Leu 164 were 
treated as wild type, such that the conformation of the amino 
acid side chain could change but not the identity Gin 10, Asn 
14, Cys 17, Phe 38, Phe 50, Thr 58, Phe 70, Gb 94, Tyr 125, 
Tyr 126, Thr 144, Asn 153, Phe 154, Thr 161, and Leu 164 
were treated with the PDA core potential for surface area 
calculation. He 95, Uu 102, Arg 147, Uu 151, and Tyr 163 
were treated with the PDA boimdary potential for surface 
area calculation. Met 1, Glu 61, Glu 81, Lys 115, Uu 130, 
Tyr 138, and Arg 159 were treated with the PDA surface 
potential, but no surface area was calculated. 

Thus, the following positions were included in the PDA 
design (see also FIG. 3): 



1 6 10 14 17 21 38 50 55 56 58 59 

Net Leu Gin Asn Cys Leu Phe Phe Ala Ala Thr He 

61 62 63 66 69 70 81 84 87 91 94 95 

Glu Met Leu He He Phe Glu Val Leu Val Gin He 
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-continued 

98 102 115 122 125 126 129 130 133 138 144 146 
Leu Leu Lye Leu Tyr Tyr lie Leu Leu Tyr Thr Val 

147 150 151 153 154 157 159 160 161 163 164 
Arg lie Leu Asn Phe lie Arg Leu Thr Tyr Leu 

The PDA calculation resulted in the following ground 
state sequence (SEQ ID N0:5): 



addition, the following residues were added and treated as 
wild type using the PDA core potential for surface area 
calculation: Gin 18, Gin 72, Ser 74, Ser 76, Thr 77, Asn 90, 
Tyr 132, Lys 136, and Ser 139. 

Thus, the following positions were included in the PDA 
design (see also FIG. 3): 



1 6 10 14 17 21 38 50 55 56 58 59 
Met Leu Gin Asn Cys Leu Phe Phe Ala Ala Thr lie 

61 62 63 66 69 70 81 84 87 91 94 95 
Glu Met Leu He He Phe Glu He Leu He Gin He 

98 102 115 122 125 126 129 130 133 138 144 146 
Phe Leu Lys He Tyr Tyr He Leu Leu Tyr Thr Val 

147 150 151 153 154 157 159 160 161 163 164 
Arg He Leu Asn Phe Leu Arg Leu Thr Tyr Leu 

This sequence shows five mutations from the wild type 
sequence, V84I, V91I, L98F, L122I, and I157L (see FIG, 
SB) (SEQ ID N0:5). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 5 A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 5A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG. 5B 
(SEQ ID N0:5). 

A-chain Core 2a Design 

A calculation similar to Core 2 was performed but now all 
wild type residues were treated with the PDA core potential 
including the surface area calculation. This calculation 
yields the same ground state sequence (SEQ ID N0:5) as 
resulted from Core 2. 



1 6 10 13 14 17 18 21 38 50 55 56 
10 Met Leu Gin Ser Asn Cys Gin Leu Phe Phe Ala Ala 

58 59 61 62 63 66 69 70 72 74 76 77 
Thr He Glu Met Leu He He Phe Gin Ser Ser Thr 

81 84 87 90 91 94 95 98 102 114 115 118 
j5 Glu Val Leu Asn Val Gin He Leu Leu Gly Lys Ser 

122 125 126 129 130 132 133 136 138 139 142 143 
Leu Tyr Tyr He Leu Tyr Leu Lys Tyr Ser Ala Trp 

144 146 147 150 151 153 154 157 159 160 161 163 
2^ Thr Val Arg He Leu Asn Phe He Arg Leu Thr Tyr 

164 
Leu 

The PDA calculation resulted in the following ground 
25 state sequence (SEQ ID N0:6): 



1 6 10 13 14 17 18 21 38 50 55 56 
Met Leu Gin Phe Asn Cys Gin Leu Phe Phe Ala Ala 

30 58 59 61 62 63 66 69 70 72 74 76 77 
Thr He Glu Met Leu He Val Phe Gin Ser Ser Thr 

81 84 87 90 91 94 95 98 102 114 115 .118 
Glu He Leu Asn He Gin He Phe Leu Gly Lys Ala 

35 122 125 126 129 130 132 133 136 138 139 142 143 
He Tyr Tyr He Leu Tyr Leu Lys Tyr Ser Ala Trp 

144 146 147 150 151 153 154 157 159 160 161 163 
Thr He Arg He Leu Asn Phe Leu Arg Leu Ala Tyr 



40 



164 



1 6 10 14 17 21 38 50 55 56 58 59 
Met Leu Gin Asn Cys Leu Phe Phe Ala Ala Thr He 

61 62 63 66 69 70 81 84 87 91 94 95 
Glu Met Leu He He Phe Glu He Leu He Gin He 

98 102 115 122 125 126 129 130 133 138 144 146 
Phe Leu Lys He Tyr Tyr He Leu Leu Tyr Thr Val 

147 150 151 153 154 157 159 160 161 163 164 
Arg He Leu Asn Phe Leu Arg Leu Thr Tyr Leu 

A-chain Core 3 Design 

A slightly larger core region than that used in core 2 was 
defined. The residues Scr 13, Cys 17, Gly 114, Ser 118, Ala 
142, Trp 143, Phe 154. and Thr 161 were added to the PDA 
design used in core 2a and allowed to change their identity. 
Ser 13, Ala 142, Trp 143, Phe 154 and Thr 161 could change 
to any PHOBIC residues except methionine; Cys 17 to any 
PHOBIC residue plus cysteine, but not to methionine; Gly 
114 could become any PHOBIC residue plus glycine, but 
not methionine; Ser 118 could become any PHOBIC residue 
plus serine, but no methionine. All these eight were treated 
with the PDA core potential for surface area calculation. In 



45 



This sequence shows 10 mutations from the wild type 
sequence, S13F, I69V, V84I, V91I. L98F, S118A, L122I, 
V146I, I157L, and T161A(see FIG. 6B) (SEQ ID N0:6). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 6A. Thus, any protein 
50 sequence showing mutations at the positions according to 
FIG. 6A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
55 active IbA. Preferred IbA sequences are shown in FIGS. 6B, 
6C, and 6D (SEQ ID NOS: 6-8). 

A-chain Core 4 Design 



60 



65 



The newly added residues Ser 13, Cys 17, Ser U8, and 
Thr 161 were now allowed to change to any of the following 
amino acids: Ala, Val, Leu, He, Phe, Tyr, Trp, Asp, Asn, Glu, 
Gin, Lys, Ser, Thr, His, and Arg, but they were still treated 
with the PDA core potential for surface area calculation. 
Otherwise this calculation is identical to Core 3. 

ITie PDA calculation resulted in the following ground 
slate sequence (SEQ ID N0:9): 



57 



1 6 10 
Met Leu Gin 

58 59 61 
Thr lie Glu 

81 84 87 
Glu lie Leu 

122 125 126 
lie Tyr Tyr 

144 146 147 
Thr He Arg 

164 
Leu 



13 14 17 18 21 
Phe Asn Asp Gin Leu 

62 63 66 69 70 
Met Leu lie Val Phe 

90 91 94 95 98 
Asn lie Gin lie Phe 

129 130 132 133 136 
He Leu Tyr Leu Lys 

150 151 153 154 157 
He Leu Asn Phe Leu 



38 50 55 56 
Phe Phe Ala Ala 

72 74 76 77 
Gin Ser Ser Thr 

102 114 115 118 
Leu Gly Lys Ala 

138 139 142 143 
Tyr Ser Ala Trp 

159 160 161 163 
Arg Leu Ala Tyr 



This sequence shows 11 mutations from the wild type 
sequence, S13F, C17D, I69V, V84I, V91I, L98F, S118A, 
L122I, V146I, I157L, and T161A (see FIG. 7B) (SEQ ID 
NO: 9). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 7 A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 7A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. Preferred IbA sequences are shown in FIGS. 7B, 
7C, and 7D (SEQ ID N0S:9-11). 

A-chain Core 5 Design 

A slightly different change in the identities of the amino 
acids than in Core4 calculation was now allowed. Leu 6, Leu 
21, Ala 55, Ala 56, He 59, Leu 63, He 66, He 69, Val 84, Val 
91, Leu 122, He 129, Leu 133, Ala 142, Trp 143, Val 146, 
He 150, Phe 154, He 157, and Leu 160 could change to any 
PHOBIC residue except methionine. Met 62 was allowed to 
change to any PHOBIC amino acid residue; Leu 87, Leu98, 
and Gly 114 were allowed to change to Ala, Val, Leu, He, 
Gly; and Ser 13, Cys 17, Ser 118, and Thr 161 could change 
to Ala, Gly, Ser, Thr, Glu, Asp, Gin, Asn, or Cys. All the 
other residues were treated as wild type as was done in the 
Core 4 calculation. 

llie PDA calculation resulted in the following ground 
state sequence (SEQ ID N0:12): 
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Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 8A. Thus, any protein 
5 sequence showing mutations at the positions according to 
FIG. 8 A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
10 active IbA. Preferred IbA sequences are shown in FIGS. 8B, 
8C, and 8D (SEQ ID NOS:12-14). A DNA library can be 
generated to mirror the probability table of FIG. 8A that 
comprises at least one sequence that is more stable and/or 
active than wild type IFN-p. 

A-chain Core 6 Design 

A similar calculation to Core 5 was performed where now 
at positions 13, 17, 113, and 117 no cysteine was allowed to 
occur. 

The PDA calculation resulted in the following ground 
state sequence (SEQ ID NO: 15): 



15 



20 



25 



30 



35 



1 


6 10 13 


14 17 18 21 


38 


50 


55 56 


Met 


Leu Gin Glu Asn Asp Gin Leu 


Phe 


Phe 


Ala Ala 


58 


59 61 62 


63 66 69 70 


72 


74 


76 77 


Thr 


He Glu Met 


Leu He Val Phe 


Gin 


Ser 


Ser Thr 


81 


84 87 90 


91 94 95 98 


102 


114 


115 118 


Glu 


He Leu Asn 


He Gin He Leu 


Leu 


Gly Lys Asn 


122 


125 126 129 


130 132 133 136 


138 


139 


142 143 


He 


Tyr Tyr He Leu Tyr Leu Lys 


Tyr 


Ser Alo Trp 


144 


146 147 150 


151 153 154 157 


159 


160 


161 163 


Thr 


He Arg He 


Leu Asn Phe Leu 


Arg 


Leu 


Ala Tyr 


164 












Leu 













40 



45 



1 


6 


10 


13 


14 


17 


18 


21 


38 


50 


55 


56 


Met 


Leu 


Gin Glu Asn Asp Gin 


Leu 


Phe 


Phe 


Ala 


Ala 


58 


59 


61 


62 


63 


66 


69 


70 


72 


74 


76 


77 


Thr 


He 


Glu 


Met 


Leu 


He 


He 


Phe 


Gin 


Ser 


Ser 


Thr 


81 


84 


87 


90 


91 


94 


95 


98 


102 


114 


115 


118 


Glu 


He 


Leu 


Asn 


He 


Gin 


He 


Leu 


Leu 


Gly Lys Ser 


122 


125 


126 


129 


130 


132 


133 


136 


138 


139 


142 


143 


Leu 


Tyr 


Tyr 


He 


Leu 


Tyr 


Leu 


Lys 


Tyr 


Ser 


Ala 


Trp 


144 


146 


147 


150 


151 


153 


154 


157 


159 


160 


161 


163 


Thr 


He 


Arg 


He 


Leu 


Asn 


Phe 


He 


Arg 


Leu 


Cys 


Tyr 



164 
Leu 



This sequence shows 7 mutations from the wild type 
sequence, S13E, C17D, V84I, V91I, S118C, V146I, and 
T161C (sec FIG. 8B) (SEQ ID N0:12). 



This sequence shows 10 mutations from the wild type 
sequence, S13E, C17D, 169V, V84I, V91I, S118A, L122I, 
V146I, I157L, and T161A (see FIG. 9B) (SEQ ID N0:15). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 9A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 9A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. Preferred IbA sequences are shown in FIGS. 9B, 
9C, and 9D (SEQ ID NOS: 15-17). A DNA library can be 
generated to mirror the probability table of FIG. 9A that 
comprises at least one sequence that is more stable and/or 
active than wild type IFN-p. 

EXAMPLE 3 

PDA Calculations for the B-chain of IFN-P 

For the B-chain, PDA calculations similar to those of the 
60 A-chain were performed. 

B-chain Core 1 Design 

The same positions as for the A-chain Core 1 calculation 
were used in the PDA design for the B-chain: Leu 6, Leu 21, 
65 Ala 55, Ala 56, He 59, Met 62, Uu 63, He 66, He 69, Val 84, 
Leu 98, Uu 122, He 129, Leu 133, Val 146, He 150, He 157, 
and Leu 160. 



50 



55 
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The PDA calculation resulted in the following ground yields the same ground state sequence (SEQ ID N0:i9) as 
state sequence (SEQ ID N0:1 8): resulted from Core 2. 



6 21 55 56 59 62 63 66 69 84 87 91 
Leu Leu Ala Ala He Met Leu lie lie lie Phe Val 

98 122 129 133 146 150 157 160 
Leu Leu lie Leu Val He He Leu 

This sequence shows two mutations from the wild type 
IFN-p sequence, V84I and L87F, and is identical with the 
ground state sequence generated for the A-chain (see FIG. 
lOB) (SEQ ID N0:18). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. lOA. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. lOA will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG. lOB 
(SEQ ID N0:18). ADNA library can be generated to mirror 
the probability table of FIG. lOA that comprises at least one 
sequence that is more stable and/or active than wild type 
IFN-p. 

B-chain Core 2 Design 

A calculation similar to that for the A-chain Core 2 design 
was performed for the B-chain. 

The PDA calculation resulted in the following ground 
state sequence (SEQ ID NO: 19): 



1 6 10 14 17 21 38 50 55 56 58 59 
Met Leu Gin Asn Cys Leu Phe Phe Ala Leu Thr He 

61 62 63 66 69 70 81 84 87 91 94 95 
Glu Met Phe He He Phe Glu He Phe He Gin He 

98 102 115 122 125 126 129 130 133 138 144 146 
Leu Leu Lys Phe Tyr Tyr He Leu Leu Tyr Thr Val 

147 150 151 153 154 157 159 160 161 163 164 
Arg He Leu Asn Phe He Arg Leu Thr Tyr Leu 

ITiis sequence shows six mutations from the wild type 
sequence, A56L, L63F, V84I, L87F, V91I, and L122F (see 
FIG. 11B)(SEQ ID N0:19). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. IIA. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 11 A will potentially generate a more stable and active 
IbA In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG, UB 
(SEQ ID NO: 19). ADNA library can be generated to mirror 
the probability table of FIG. 11 A that comprises at least one 
sequence that is more stable and/or active than wild type 
IFN-p. 

B-chain Core 2a Design 

A calculation similar to that for the A-chain Core 2a 
design was performed for the B-chain. This calculation 



5 1 6 10 14 17 21 38 50 55 56 58 59 
Met Leu Gin Asn Cys Leu Phe Phe Ala Ala Thr He 

61 62 63 66 69 70 81 84 87 91 94 95 
Glu Met Leu He He Phe Glu He Phe He Gin He 

98 102 115 122 125 126 129 130 133 138 144 146 
Leu Leu Lys Phe Tyr Tyr He Leu Leu Tyr Thr Val 

147 150 151 153 154 157 159 160 161 163 164 
Arg He Leu Asn Phe He Arg Leu Thr Tyr Leu 

15 This sequence shows six mutations from the wild type 
sequence, A56L, L63F, V84I, L87F, V91I, and L122F 

B-chain Core 3 Design 

20 A calculation similar to that for the A-chain Core 3 was 
performed for the B-chain, but instead of residue Gin 18, 
Phe 15 was included in the wild type PDA residue list. 

The PDA calculation resulted in the following ground 
state sequence (SEQ ID NO:20): 





1 


6 


10 


13 


14 


15 


17 


21 


38 


50 


55 


56 




Met 


Leu 


Gin 


Leu 


Asn 


Phe 


Cys 


Leu 


Phe 


Phe 


Ala 


Leu 




58 


59 


61 


62 


63 


66 


69 


70 


72 


74 


76 


77 


30 


Thr 


He 


Glu 


Met 


Leu 


He 


He 


Phe 


Gin 


Ser 


Ser 


Thr 




81 


84 


87 


90 


91 


94 


95 


98 


102 


114 


115 


118 




Glu 


He 


Leu 


Asn 


He 


Gin 


He 


Leu 


Leu 


Phe 


Lys 


Leu 




122 


125 


126 


129 


130 


132 


133 


136 


138 


139 


142 


143 


35 


He 


Tyr Tyr 


He 


Leu 


Tyr 


Leu 


Lys 


Tyr 


Ser 


Ala 


Trp 




144 


146 


147 


150 


151 


153 


154 


157 


159 


160 


161 


163 




Thr 


Val 


Arg 


He 


Leu 


Asn 


Phe 


He 


Arg 


Leu 


Ala 


Tyr 



164 
40 Leu 



This sequence shows 8 mutations from the wild type 
sequence, S13L, A56L, V84I, V91I, G114F, S118L, L122I, 
and T161 A (see FIG. 12B) (SEQ ID NO:20). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 12A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 12A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG. 12B 
(SEQ ID NO: 20). A DNA library can be generated to mirror 
the probability table of FIG. 12A that comprises at least one 
sequence that is more stable and/or active than wild type 
IFN-p. 

B-chain Core 4 Design 

A calculation similar to that for the A-chain Core 4 design 
was performed for the B-chain, but instead of residue Gin 
18, Phe 15 was included in the wild type PDA residue list. 

ITie PDA calculation resulted in the following ground 
state sequence (SEQ ID N0:21): 
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6 


10 


13 


14 


15 


17 


21 


38 


SO 


55 


56 


Met 


Leu 


Gin 


Leu 


Asn 


Phe 


Ala 


Leu 


Phe 


Phe 


Ala 


Leu 


58 


59 


61 


62 


63 


66 


69 


70 


72 


74 


76 


77 


Thr 


lie 


Glu 


Met 


Leu 


He 


He 


Phe 


Gin 


Ser 


Ser 


Thr 


81 


84 


87 


90 


91 


94 


95 


98 


102 


114 


115 


118 


Glu 


He 


Phe 


Asn 


Leu 


Gin 


He 


Leu 


Leu 


Phe 


Lys 


Leu 


122 


125 


126 


129 


130 


132 


133 


136 


138 


139 


142 


143 


lie Tyr Tyr 


He 


Leu 


Tyr Leu 


Lys 


Tyr 


Ser 


Ala 


Trp 


144 


146 


147 


150 


151 


153 


154 


157 


159 


160 


161 


163 


Thr 


Val 


Arg 


He 


Leu 


Asn 


Phe 


He 


Arg 


Leu 


Glu Tyr 
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IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG. 14B 
5 (SEQ ID NO: 22). A DNA library can be generated to mirror 
the probability table of FIG. 14A that comprises at least one 
sequence that is more stable and/or active than wild type 
IFN-p. 



10 B -chain Core 6 Design 

A similar calculation similar to that for the A-chain Core 
6 design was performed for the B-chain. 
The PDA calculation resulted in the following ground 
164 15 state sequence (SEQ ID NO:23): 

Leu 



This sequence shows 10 mutations from the wild type 
sequence, S13L, C17A, A56L, V84I, L87F, V91L, G114F, 
S118L, L122I, and T161E (see FIG, 13B) (SEQ ID N0:21). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 13 A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 13A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG. 13B 
(SEQ ID N0:21). ADNA library can be generated to mirror 
the probability table of FIG. 13A that comprises at least one 
sequence that is more stable and/or active than wild type 
IFN-p. 

B-chain Core 5 Design 

A calculation similar to that for the A-chain Core 5 design 
was performed for the B-chain. Now, Gin 18 was included 
in the wild type PDA residue list, exactly as was done in the 
Core 5 calculation for the A-chain. 

'Ilie PDA calculation resulted in the following ground 
state sequence (SEQ ID NO:22): 
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10 


13 


14 


17 


18 


21 


38 


50 


55 


56 


Met 


Leu 


Gin 


Glu 


Asn Cys Gin Leu 


Phe 


Phe 


Ala 


Leu 


58 


59 


61 


62 


63 


66 


69 


70 
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74 


76 


77 


Thr 


He 


Glu 


Met 


Leu 


He 


He 


Phe 


Gin 


Ser 


Ser 


Thr 


81 


84 


87 


90 


91 


94 


95 


98 


102 


114 


115 


118 


Glu 


He 


Leu 


Asn 


He 


Gin 


He 


Leu 


Leu 


Leu 


Lys 


Glu 


122 


125 


126 


129 


130 


132 


133 


136 


138 


139 


142 


143 


Leu 


Tyr 


Tyr 


He 


Leu 


Tyr 


Leu 


Lys 


Tyr 


Ser 


Ala 


Trp 


144 


146 


147 


150 


151 
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157 


159 


160 


161 


163 


Thr 


val 


Arg 


He 


Leu 


Asn 


Phe 


He 


Arg 


Leu Glu Tyr 



164 
Leu 



25 



1 
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10 


13 


14 


17 


18 


21 


38 


50 


55 


56 


Met 


Leu 


Gin 


Ser 


Asn 


Thr 


Gin 


Leu 


Phe 


Phe 


Ala 


Leu 


58 


59 


61 


62 


63 


66 


69 


70 


72 


74 


76 


77 


Thr 


He 


Glu 


Met 


Leu 


He 


He 


Phe 


Gin 


Ser 


Ser 


Thr 


81 


84 


87 


90 


91 


94 


95 


98 


102 


114 


115 


118 


Glu 


He 


Leu 


Asn 


He 


Gin 


He 


Leu 


Leu 


Leu Lys Glu 


122 


125 


126 


129 


130 


132 


133 


136 


138 


139 


142 


143 


Leu 


Tyr 


Tyr 


He 


Leu Tyr 


Leu Lys 


Tyr 


Ser 


Ala 


Trp 


144 


146 


147 


150 


151 


153 


154 


157 


159 


160 


161 


163 


Thr 


Val 


Arg 


He 


Leu 


Asn 


Phe 


He 


Arg 


Leu 


Glu 


Tyr 



This sequence shows 7 mutations from the wild type 
sequence, C17T, A56L, V84I, V91I. G114L, S118E, and 
T161E (see FIG. 15B) (SEQ ID NO:23). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 15 A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG, ISAwill potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG. 15B 
(SEQ ID NO:23). A DNA library can be generated to mirror 
the probability table of FIG. 15A that comprises at least one 
sequence that is more stable and/or active than wild type 
IFN-p. 

50 B-chain Core 7 Design 

A similar calculation similar to that of the B-chain Core 
6 design was performed. Now Gly 114 is treated as a wild 
type residue. 

55 The PDA calculation resulted in the following ground 
state sequence (SEQ ID NO:24): 



This sequence shows 7 mutations from the wild type 
sequence, S13E, A56L, V84I, V91I, G114L, S118E, and 
T161E (see FIG. 14B) (SEQ ID NO:22). 

Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 14A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 14A will potentially generate a more stable and active 



1 6 10 13 14 17 18 21 38 50 55 56 
Met Leu Gin Ser Asn Thr Gin Leu Phe Phe Ala Leu 

58 59 61 62 63 66 69 70 72 74 76 77 
Thr He Glu Met Leu He He Phe Gin Ser Ser Thr 

81 84 87 90 91 94 95 98 102 114 115 118 
Glu He Leu Asn He Gin He Leu Leu Gly Lys Glu 

122 125 126 129 130 132 133 136 138 139 142 143 
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63 

-continued 

Leu Tyr Tyr He Leu Tyr Leu Lys Tyr Ser Ala Trp 

144 146 147 150 151 153 154 157 159 160 161 163 
Thr Val Arg He Leu Asn Phe He Arg Leu Glu Tyr 

164 
Leu 



This sequence shows 6 mutations from the wild type 
sequence, C17T, A56L, V84I, V91I, S118E, and T161E (see 
FIG. 16B) (SEQ ID NO:24). With the exception of position 
114, now remaining glycine, the ground state sequence is 
identical to that of Core 6 for the B<hain. 
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Using Monte Carlo technique a list of low energy 
sequences was generated. The analysis of the lowest 1000 
protein sequences generated by Monte Carlo leads to the 
mutation pattern shown in FIG. 16A. Thus, any protein 
sequence showing mutations at the positions according to 
FIG. 16A will potentially generate a more stable and active 
IbA. In particular those protein sequences found among the 
list of the lowest 101 MC generated sequences (data not 
shown) have a high potential to result in a more stable and 
active IbA. A preferred IbA sequence is shown in FIG. 16B 
(SEQ ID NO:24). ADNA library can be generated to mirror 
the probability table of FIG, 16A that comprises at least one 
sequence that is more stable and/or active than wild type 
IFN-p. 



SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS : 24 

<210> SEQ ID NO 1 

<211> LENGTH: 166 

<212> TYPE: PRT 

<213> ORGANISM; Homo sapiens 



<400> SEQUENCE: 1 



Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Ser Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 



Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lye Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr lie Val Glu Aen Leu Leu Ala Asn Val Tyr His Gin He Asn 
65 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ser Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 



Thr Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 2 

<211> LENGTH: 75 7 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 



<400> SEQUENCE: 2 



atgaccaaca agtgtctcct ccaaattgct ctcctgttgt gcttctccac tacagctctt 60 
tccatgagct acaacttgct tggattccta caaagaagca gcaattttca gtgtcagaag 120 
ctcctgtggc aattgaatgg gaggcttgaa tattgcctca aggacaggat gaactttgac 180 
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-continued 



atccctgagg ogattaagca gctgcagcag ttccagaagg aggacgccgc attgaccatc 240 

tatgagatgc tccagaacat ctttgctatt ttcagacaag attcatctag cactggctgg 300 

aatgagacta ttgttgagaa cctcctggct aatgtctatc atcagataaa ccatctgaag 360 

acagtcctgg aagaaaaact ggagaaagaa gattttacca ggggaaaact catgagcagt 420 

ctgcacctga aaagatatta tgggaggatt ctgcattacc tgaaggccaa ggagtacagt 480 

cactgtgcct ggaccatagt cagagtggaa atcctaagga acttttactt cattaacaga 540 

cttacaggtt acctccgaaa ctgaagatct cctagcctgt ccctctggga ctggacaatt 600 

gcttcaagca ttcttcaacc agcagatgct gtttaagtga ctgatggcta atgtactgca 66 0 

aatgaaagga cactagaaga ttttgaaatt tttattaaat tatgagttat ttttatttat 720 

ttaaatttta ttttggaaaa taaattattt ttggtgc 757 

<210> SEQ ID NO 3 

<211> LENGTH: 21 

<212> TYPE: PRT 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 3 

Met Thr Asn Lys Cys Leu Leu Gin lie Ala Leu Leu Leu Cys Phe Ser 
15 10 15 

Thr Thr Ala Leu Ser 
20 



<210> SEQ ID NO 4 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 4 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Ser Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Aen 
65 70 75 80 

Glu Thr He He Glu Asn Phe Leu Ala Aen Val Tyr His Gin He Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ser Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 

Thr Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 5 
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<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 5 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Ser Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Phe Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ser Ser Leu His He Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cya Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe Leu Asn Arg Leu 
145 150 155 160 

Thr Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 6 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 6 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Phe Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lye Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala Val Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Phe Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ala Ser Leu His He Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 



He He Arg Val Glu He Leu Arg Asn Phe Tyr Phe Leu Asn Arg Leu 



1 
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145 150 155 160 

Ala Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 7 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 7 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Tyr Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp lie Pro Glu Glu lie Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr lie Tyr Glu Met Leu Gin 
50 55 60 

Asn lie Phe Ala Val Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr lie lie Glu Asn Leu Leu Ala Asn lie Tyr His Gin lie Asn 
85 90 95 

His Phe Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Val Ser Leu His Val Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He He Arg Val Glu He Leu Arg Asn Phe Tyr Phe Leu Asn Arg Leu 
145 150 155 160 

Ala Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 8 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 6 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Phe Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 



His Phe Lys Thr Val Leu Glu Glu Lye Leu Glu Lys Glu Asp Phe Thr 
100 105 110 
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-continued 

Arg Gly Lys Leu Met Ala Ser Leu His lie Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu lie Leu Arg Asn Phe Tyr Phe Leu Asn Arg Leu 
145 150 155 160 

Ala Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 9 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 9 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Phe Asn Phe Gin 
15 10 15 

Asp Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala Val Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Phe Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ala Ser Leu His He Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He He Arg Val Glu He Leu Arg Asn Phe Tyr Phe Leu Asn Arg Leu 
145 150 155 160 

Ala Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 10 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 10 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Tyr Asn Phe Gin 
15 10 15 

Asp Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 



Asn He Phe Ala Val Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 
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Glu Thr lie lie Glu Asn Leu Leu Ala Asn lie Tyr His Gin lie Asn 
85 90 95 



His Phe Lys Thr Val Leu Glu Glu 
100 

Arg Gly Lys Leu Met Val Ser Leu 

115 120 

lie Leu Hie Tyr Leu Lys Ala Lys 
130 135 

lie lie Arg Val Glu He Leu Arg 
145 150 



Asp Gin Lys Leu Leu Trp Gin Leu 
20 

Lys Asp Arg Met Asn Phe Asp lie 
35 40 

Gin Phe Gin Lys Glu Asp Ala Ala 
50 55 

Asn He Phe Ala He Phe Arg Gin 
65 70 

Glu Thr He He Glu Asn Leu Leu 
85 

His Phe Lys Thr Val Leu Glu Glu 
100 

Arg Gly Lys Leu Met Ala Ser Leu 

115 120 

He Leu His Tyr Leu Lys Ala Lys 
130 135 

He Val Arg Val Glu He Leu Arg 
145 150 



Lye Leu Glu Lys Glu Asp Phe Thr 
105 110 

His Val Lye Arg Tyr Tyr Gly Arg 
125 

Glu Tyr Ser His Cys Ala Trp Thr 
140 

Asn Phe Tyr Phe Leu Asn Arg Leu 
155 160 



Gin Arg Ser Phe Asn Phe Gin 
10 15 

Asn Gly Arg Leu Glu Tyr Cys Leu 
25 30 

Pro Glu Glu He Lys Gin Leu Gin 
45 

Leu Thr He Tyr Glu Met Leu Gin 
60 

Asp Ser Ser Ser Thr Gly Trp Asn 
75 80 

Ala Asn He Tyr His Gin He Asn 
90 95 

Lys Leu Glu Lys Glu Asp Phe Thr 
105 110 

His He Lys Arg Tyr Tyr Gly Arg 
125 

Glu Tyr Ser His Cys Ala Trp Thr 
140 

Asn Phe Tyr Phe Leu Asn Arg Leu 
155 160 



Ala Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 11 

<211> LENGTH: 166 

<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 

<400> SEQUENCE! 11 

Met Ser Tyr Asn Leu Leu Gly Phe Leu 
1 5 



Ala Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 12 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 12 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Glu Asn Phe Gin 
15 10 15 

Asp Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 



Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
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Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr lie Tyr Glu Met Leu Gin 
50 55 60 

Asn lie Phe Ala lie Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Aen He Tyr His Gin He Asn 
85 90 95 

His Leu Lye Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Cys Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He He Arg Val Glu He Leu Arg Asn Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 

Cys Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 13 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 13 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Ala Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Cys Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lye Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He He Arg Val Glu He Leu Arg Aen Phe Tyr Phe Leu Aen Arg Leu 
145 150 155 160 

Cys Gly Tyr Leu Arg Asn 
165 



<:210> SEQ ID NO 14 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 



<400> SEQUENCE: 14 
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Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Glu Asn Phe Gin 
15 10 15 

Asp Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp lie Pro Glu Glu lie Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Aep Ala Ala Leu Thr lie Tyr Glu Met Leu Gin 
50 55 60 

Asn lie Phe Ala lie Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr lie lie Glu Asn Leu Leu Ala Asn lie Tyr His Gin lie Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Cys Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

lie Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

lie Val Arg Val Glu lie Leu Arg Asn Phe Tyr Phe lie Asn Arg Leu 
145 150 155 160 

Cys Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 15 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 15 



Met Ser Tyr Asn Leu Leu Gly Phe 
1 5 

Asp Gin Lys Leu Leu Trp Gin Leu 
20 

Lys Asp Arg Met Asn Phe Asp lie 
35 40 

Gin Phe Gin Lys Glu Asp Ala Ala 
50 55 

Asn lie Phe Ala Val Phe Arg Gin 
65 70 



Leu Gin Arg Ser Glu Asn Phe Gin 
10 15 

Asn Gly Arg Leu Glu Tyr Cys Leu 
25 30 

Pro Glu Glu lie Lys Gin Leu Gin 
45 

Leu Thr lie Tyr Glu Met Leu Gin 
60 

Asp Ser Ser Ser Thr Gly Trp Asn 
75 80 



Glu Thr lie lie Glu Asn Leu Leu Ala Asn lie Tyr His Gin lie Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ala Ser Leu His lie Lys Arg Tyr Tyr Gly Arg 
115 120 125 

lie Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

lie He Arg Val Glu He Leu Arg Asn Phe Tyr Phe Leu Asn Arg Leu 
145 150 155 160 

Ala Gly Tyr Leu Arg Asn 
165 



<2I0> SEQ ID NO 16 
<211> LENGTH: 166 



us 6,514,729 Bl 
79 80 

-continued 



<212> TVPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 16 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Glu Asn Phe Gin 
15 10 15 

Asp Gin Lye Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp lie Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Aep Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ala Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He He Arg Val Glu He Leu Arg Asn Phe Tyr Phe Leu Asn Arg Leu 
145 150 155 160 

Thr Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 17 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATUEIE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 17 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Glu Asn Phe Gin 
15 10 15 

Asp Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Aep Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Aep Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ala Ser Leu His He Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 



He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe Leu Asn Arg Leu 
145 150 155 160 
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Ala Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 18 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 18 

Met Ser Tyr Aan Leu Leu Gly Phe Leu Gin Arg Ser Ser Aan Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Aen Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp lie Pro Glu Glu lie Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Ala Leu Thr lie Tyr Glu Met Leu Gin 
50 55 60 

Asn lie Phe Ala lie Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr lie He Glu Asn Phe Leu Ala Asn Val Tyr His Gin He Asn 
85 90 95 

Hia Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ser Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 

Thr Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 19 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 19 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Ser Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lya Glu Asp Ala Leu Leu Thr He Tyr Glu Met Phe Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Phe Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Ser Ser Leu His Phe Lys Arg Tyr Tyr Gly Arg 
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115 120 125 

lie Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser Hie Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Aen Phe Tyr Phe He Aen Arg Leu 
145 150 155 160 

Thr Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 20 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 



<400> SEQUENCE: 20 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Leu Aen Phe Gin 
15 10 15 

CyB Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Leu Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lye Glu Asp Phe Thr 
100 105 110 

Arg Phe Lye Leu Met Leu Ser Leu His He Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cye Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 

Ala Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 21 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 21 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Leu Asn Phe Gin 
15 10 15 

Ala Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lye Glu Asp Ala Leu Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Aen He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 



* 
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Glu Thr He He Glu Aen Phe Leu Ala Asn Leu Tyr His Gin He Asn 
85 90 95 

His Leu Lye Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Phe Lys Leu Met Leu Ser Leu His He Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 

Glu Gly Tyr Leu Arg Aen 
165 



<210> SEQ ID NO 22 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 22 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Glu Asn Phe Gin 
15 10 15 

Cys Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Leu Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He lie Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Leu Lys Leu Met Glu Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 

Glu Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 23 
<2il> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<400> SEQUENCE: 23 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Ser Asn Phe Gin 
15 10 15 

Thr Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 



Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 



87 
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Gin Phe Gin Lys Glu Asp Ala Leu Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Leu Lye Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Leu Lys Leu Met Glu Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Asn Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 



Glu Gly Tyr Leu Arg Asn 
165 



<210> SEQ ID NO 24 
<211> LENGTH: 166 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: synthetic 
<4 00> SEQUENCE: 24 

Met Ser Tyr Asn Leu Leu Gly Phe Leu Gin Arg Ser Ser Asn Phe Gin 
15 10 15 

Thr Gin Lys Leu Leu Trp Gin Leu Asn Gly Arg Leu Glu Tyr Cys Leu 
20 25 30 

Lys Asp Arg Met Asn Phe Asp He Pro Glu Glu He Lys Gin Leu Gin 
35 40 45 

Gin Phe Gin Lys Glu Asp Ala Leu Leu Thr He Tyr Glu Met Leu Gin 
50 55 60 

Asn He Phe Ala He Phe Arg Gin Asp Ser Ser Ser Thr Gly Trp Asn 
65 70 75 80 

Glu Thr He He Glu Asn Leu Leu Ala Asn He Tyr His Gin He Asn 
85 90 95 

His Leu Lys Thr Val Leu Glu Glu Lys Leu Glu Lys Glu Asp Phe Thr 
100 105 110 

Arg Gly Lys Leu Met Glu Ser Leu His Leu Lys Arg Tyr Tyr Gly Arg 
115 120 125 

He Leu His Tyr Leu Lys Ala Lys Glu Tyr Ser His Cys Ala Trp Thr 
130 135 140 

He Val Arg Val Glu He Leu Arg Aan Phe Tyr Phe He Asn Arg Leu 
145 150 155 160 

Glu Gly Tyr Leu Arg Asn 
165 



I claim: 60 
1. A non-natural ly occurring interferon-beta activity (lb A) 
protein comprising at least fifteen amino acid substitutions 
as compared to human IFN-P protein (SEQ ID NO: 1), 
wherein said substitutions are selected from amino acid 
residues at positions 6. 13, 17, 21. 56, 59, 61, 62, 63, 66, 69. 
84. 87, 91, 98, 102. 114. 118, 122, 129, 146, 150, 154, 157, 65 
160, and 161, wherein said protein exhibits at least 50% of 
the biological activity of human IFN-fi protein. 



2. 'ITie noD-naturally occurring IbA protein according to 
claim 1, wherein said amino acid substitutions are selected 
from positions 13, 17, 56, 63, 69, 84, 87. 91, 98, 114, 118, 
122. 146, 157, and 161. 

3. The non-naturally occurring IbA protein according to 
claim 2. wherein said substitutions are selected from the 
group of substitutions consisting of S13F, S13Y. S13E, 
S13A, S13L, C17D. C17A, C17T, A56L, L63F. I69V, V84I. 
V91I. L98F. G114F, G114L, S118L, S118E, S118A, S118V, 
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S118C, L122I, L122F, L122V, V146I, I157L, T161A, 
T161E, and T161C. 

4. The non-naturally occurring IbA protein according to 
claim 1 comprising substitutions at positions 13, 17, 69, 84, 
87, 91, 98. 118, 122, 146, 157, and 161. 5 

5. The non-naturally occurring IbA protein according to 
claim 4, wherein said substitutions are selected from the 
group of substitutions consisting of S13F, S13Y, S13E, 
SUA, C17D, 169V, V84I, L87F, V91I, L98F, S118A, 
S118V, S118C, L122I, L122F, I157L, T161A, and T161C. lO 

6. The non-naturally occurring IbA protein according to 
claim 1 comprising substitutions at positions 13, 17, 56, 63, 
84, 87, 91, 114, 118, 122, and 161. 

7. The non-naturally occurring IbA protein according to 
claim 6, wherein said substitutions are selected from the is 
group of substitutions consisting of S13E, S13L, C17A, 
C17T, A56L, L63F, V84I, L87F, V911, G114F, G114L, 
S118L, S118E, L122I, L122F, T161A, and T161E. 

8. A pharmaceutical composition comprising an IbA pro- 
tein according to claim 1 and a pharmaceutical carrier. 20 

9. A non-naturally occurring protein according to claim 1 
wherein said biological activity is the abihty to bind to an 
IFN receptor. 
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10. A non-nauirally occurring protein according to claim 
1 wherein said biological activity is the ability to inhibit cell 
proliferation. 

11. A non-naturally occurring protein according to claim 
1 wherein said biological activity is the ability to inhibit viral 
infections. 

12. A recombinant nucleic acid encoding the non- 
naturally occurring IbA protein of claim 1. 

13. An expression vector comprising the recombinant 
nucleic acid of claim 12. 

14. A host cell comprising the recombinant nucleic acid of 
claim 12. 

15. A host cell comprising the expression vector of claim 
13. 

16. A method of producing a non-naturally occurring IbA 
protein comprising culturing the host cell of claim 15 under 
conditions suitable for expression of said nucleic acid. 

17. The method according to claim 16 further comprising 
recovering said IbA protein. 

* * * * * 



